
Overview of Statistical Techniques for Data Analysis
Explore statistical techniques selection based on research findings from scientific articles, including common methods like t-test, ANOVA, correlation, and regression. Evaluate the quality of reporting statistical results and identify misconceptions in writing conclusions.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Statistical Techniques for Data Statistical Techniques for Data Analysis : Overview of Analysis : Overview of Statistical Techniques Selection Statistical Techniques Selection Sunee Raksakietisak Srinakharinwirot University sunee@g.swu.ac.th
About this presentation About this presentation These set of slides were last used in the presentation titled Reporting Statistical Data Analysis Results At the seminar and workshop on Writing Scientific Articles for Publication IV on 19-20 July 2008 Organized by Thai-Australian Technological Services Center (TATSC) (http://www.tatsc.or.th/index.php/events/110- seminar-workshop-writing-scientific-articles-for- publication-iv-19-20-july-2008)
Introduction Article Introduction Article ScienceAsia Vol.32 No.1 march 2006 Understanding Data: Important for All Scientists, and Where Any Nation Might Excel http://www.scienceasia.org/2006.32.n1/001.php My work is in response to this article and for my presentation: Back to the basic of statistical methods
Research for my talk Research for my talk Investigate how statistical methods have been used in scientific papers The articles in ScienceAsia (online) were investigated: Current issue (Vol.33 No.2 June 2007) back to Vol.31 No1. March 2005 Total of 154 articles The keyword statistic was used for searching covers words like statistics , statistically some articles need the word significant
Research for my talk Research for my talk (cont d) (cont d) Results: 40 articles out of 154 articles used statistical methods (26%)
Statistical Methods Statistical Methods What are the statistical methods that have been used? And how often? Results: t-test (28% ) One way ANOVA (68%) Two way ANOVA (10%) Correlation (13%) Regression (15%) Others (3%)
Further questions? Further questions? How good is the writing/reporting of statistical part? Evaluation in a 10 points scale: Title: 0 1 point Abstract: 0 2 points Method: 0 3 points Results: 0 4 points How much is the misconception of writing/reporting statistical results/conclusion?
Overview of Statistical Methods Overview of Statistical Methods Descriptive statistics Qualitative data (nominal, ordinal) : Frequency and percentages Quantitative data (interval, ratio / scale / numeric) : Mean and SD Distinguish between SD and SEM (Standard Error of the Mean) ! Inferential statistics Hypothesis testing
Statistical Methods Statistical Methods t-test : compare two means One way ANOVA : compare two or more means One factor (the effect of the factor on the measured variables) Two-way ANOVA Two factors (the effect of 2 factors on the measured variables)
Steps in statistical hypothesis testing Steps in statistical hypothesis testing 1. Formulate hypothesis: Hoand H1 Set level of significance ( = 0.05, 0.01, 0.10) Statistics used to test hypothesis in (1) This statistics is called Inferential statistics Formulae (don t need to know) Has distribution (Z, t, F, 2) Decision rule: Reject Ho if P-value < Calculate statistics and p-value Statistical package gives these values Make decision: Reject Ho or Do not reject Ho Reject Homeans that the test is significant 2. 3. 4. 5. 6.
Normal t Chi-square F
What is P What is P- -value value P-value is the probability from the value of statistics to tails of distribution (either one tail or two tails) Web page to calculate the p-value of various distribution: http://vassarstats.net/tabs.html (http://faculty.vassar.edu/lowry/tabs.html)
P P- -value value (cont d) (cont d) P-value can never be zero !!! Often found misconception since the statistical package gives value up to some decimal places e.g. for 3 decimal places, if p-value is very small--smaller than .001--the package will show .000 hence we have to say P < .001 instead of P = .000
Misconception about Misconception about Alpha and P Alpha and P- -value value The frequency of cell division was calculated after 2 weeks of culture and was statistically analyzed by analysis of variance (ANOVA) at p 0.05 (correct??) Means within a column followed by the same letter are not significantly different at P 0.05 according to DMRT (correct??)
Misconception Misconception(cont d) (cont d) Statistical significance was defined as < 0.05 (correct??) The repeated measurements of L value and rehydration ratio of the dehydrated products from different pre- treatments were subject to analysis of variance (p=0.05) (correct??) Significant difference at p < .05 (correct??)
Correct concept Correct concept Collected data were statistically analyzed and mean separation was calculated according to the Least Significant Difference (LSD) method at the 5% level of significance (Correct) Results were considered to be statistically significant when p<0.05(Correct)
Correct concept Correct concept * P < 0.05, ** P< 0.01, *** P<0.001; ns not significant (Correct) ** = significant at 1% level, ns = non-significant (Correct) The bars with the same letter are not significantly different (P>0.05) (Correct)
About t About t- -test Two variables Dependent (variable to compare mean): scale Independent (group variable): nominal Has 2 levels/groups Common mistakes: Independent variable has more than 2 groups, did t-test for many pairs (should do ANOVA) test
About t About t- -test (cont d) test (cont d) Statistical test: test whether the two means are different significantly The test is significant when the null hypothesis (mean the same) is rejected; that is the means are different T-test has 2 formula: variances equal and variances not equal
Reporting results (Example) Reporting results (Example) See worksheet N, Mean, and SD for each group, t, and p-value In journal articles, different ways of reporting Report Mean and SD (no N) Report Mean and SEM Report as Mean SD, Mean SEM Note: SEM = SD/ N SEM gives the picture of Confidence Interval (C.I.)
About One way ANOVA About One way ANOVA Two variables Dependent (variable to compute mean): scale One independent (factor): nominal Has 2 levels/groups or more t-test is a special case of one way ANOVA T2= F
One way ANOVA (contd) One way ANOVA (cont d) Statistical test: test whether there is any effect of factor on dependent variable (or are all the means equal?) F test (test statistics has F distribution) The test is significant means that there is an effect of factor on dependent variables; at least one pair of the mean is different Multiple comparisons of all pairs of mean by LSD, Duncan, SNK, Tukey, Bonferroni, Scheff
Reporting Results (Example) Reporting Results (Example) See worksheet N, Mean, and SD for each group F, and p-value Symbol indicating the difference in means from multiple comparisons
About Two way ANOVA About Two way ANOVA Three variables Dependent (variable to compute mean): scale Two independent variables (factors): nominal
Two way ANOVA (contd) Two way ANOVA (cont d) Statistical test: Test for interaction effect first Plot graph to visualize interaction effect If no interaction effect then test for main effect of each factor (one way ANOVA) If there is interaction effect then test for simple effect: the effect of one factor for each level of another factor
Reporting Results (Example) Reporting Results (Example) See worksheet N, Mean, and SD for each group Table indicating the significant of main effect of each factor and interaction effect
Stop here for a minute Stop here for a minute t-test, ANOVA is parametric statistical methods for mean comparisons It is a univariate analysis (one dependent variable) Parametric methods have an assumption that the dependent variable has normal distribution
Test for Normality Test for Normality Test for normality can be easily done by statistical package If not normal, try transformation If normal, then parametric test can be used If still not normal after transformation, use nonparametric statistical methods If other assumptions of parametric such as equal variances are not assumed, use nonparametric test
Nonparametric test Nonparametric test Rank of data is used instead of raw data Robust but give lower power than parametric test Equivalent parametric , nonparametric methods see summary commands in SPSS Most of the time the conclusion by either parametric or nonparametric tests are the same
From comparison to Modeling From comparison to Modeling Most of the scientific experiments, manipulated (independent) variables are quantitative variables But when doing the experiment, some values are selected for experiment Temperature (e.g. 3 levels of temperature) To see effect of temperature on . (dependent variable) One way ANOVA Show graph of mean of dependent variable on each level of temperature
Correlation Correlation Correlation of 2 variables (both must be scale variables) Correlation often mean Pearson Correlation Assume linear correlation Assume bivariate normal distribution (It is parametric methods) Nominal variable with 2 values (level) is ok (watch out if more than 2 values, not ok) If not normal, use rank correlation (nonparametric)
Regression Regression It is a modeling technique: cause (independent) and effect (dependent) Model: Regression equation (prediction equation) How good is the model: R2, percentage of variance of dependent variable explained (accounted for) by independent variables (predictors) Only one dependent variables, but can be many independent variables (predictors) All must be scale variables Modeling: Enter, Forward, Backward, Stepwise
Nominal Dependent Variable Nominal Dependent Variable Variables of interest (dependent) often nominal in medical area Has lung cancer or doesn t have Has heart attack or doesn t have Use chi-square to test the differences (like t-test or ANOVA) Use logistic regression for modeling
Reliability Analysis Reliability Analysis Cognitive test analysis Reliability coefficient: KR20 / Cornbach Alpha Item statistics: difficulty index, discrimination index (item to total correlation /point biserial correlation) Affective test analysis (e.g. likert scale) Reliability coefficient: Cornbach Alpha Item statistics: discrimination index (item to total correlation) See details in handout article
My hope and final remark My hope and final remark You have big picture of how to choose statistical methods for your data analysis You know how/what to report statistical data analysis results in the research journal articles See examples of research articles using various statistical methods