
Effective Research Strategies for Empirical Investigations
Discover key guidelines for empirical research, including posing significant questions, linking research to theory, utilizing appropriate methods, establishing a coherent chain of reasoning, and disclosing results for professional scrutiny. Explore data analysis techniques, variable types, and statistical methods to enhance research validity and generalizability.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Honours Project 4900 2010 Gavin T L Brown, PhD
1. Pose significant questions that can be investigated empirically 2. Link research to relevant theory 3. Use methods that permit direct investigation of the question 4. Provide coherent and explicit chain of reasoning 5. Replicate & generalise across studies 6. Disclose research to encourage professional scrutiny & critique 4. Provide coherent and explicit chain of reasoning
Can we use these data? Types of variables Issues missing data; cleaning & checking; normality What have we got in each variable? Descriptive statistics: N, M, SD Cross-tabulations Are things the same? Inferential statistics relative to chance (statistical significance) p value and appropriate tests t-test, F-test Scale of differences (practical significance) effect size,
Matrix; spreadsheet Rows, columns, cells Rows=cases, participants, all info about 1 person Columns=variables (the things we are interested in on which cases VARY or differ) Cells=intersection of variables x case (contains values for case) Data entry in SPSS in DATA VIEW
Type Type Definition Definition Example Example Comment Comment Possible Statistics Analysis of variance; distribution chi-square Non- parametric if rank only Possible Statistics Nominal Categories Sex (Male, Female) Arbitrary order; can be turned into dummy variable Approximately equal intervals; may be treated as points on continuum Equal interval, continuum Ordinal Ordered ranks Preferences, Opinions (1st, 2nd, 3rd) Scale continuous Age (39, 40, 41, 42) Parametric tests; M, SD, correlations regressions factor analysis
Name Usually limited by early software to 8 characters Type of measure Nominal, ordinal, scale Label What does this variable represent? Codes for values for ordinal & nominal How is each response represented in the cells? Code for missing values (if any) SPSS variable view
Predictor Causal (makes something happen) Independent of other facts E.g., sex, age, ethnicity Dependent The thing being caused, shaped, changed, influenced Depends on the influence of some other force E.g., attitude, academic performance
Quality interpretations depend on quality data Are the values as entered correct? Are the values as entered logically feasible? What does the analyst do with missing responses? Are the data ready to analyse? Usually requires assumption of normality
Is the centre and spread of each variable more or less normal?
Where is the middle? Mean (M) arithmetic average of all scores Compulsory approach for continuous data Mode most frequent score; good approach for categorical data Median score for person at mid-point of distribution (50th percentile) Good for both continuous and categorical data
Statistics 650.00 L3Nov05aRs Nov 05 L3 aRs Valid Missing Mean Median Mode a. N 82 Nov 05 L3 aRs 1 600.00 566.4390 562.5000 558.00a 550.00 Multiple modes exist. The smallest value is shown 500.00
Variancean indication of how closely the central tendency score represents the true values Large variance means mean/mode/median does not represent all people well Standard Deviation The value that summarises the distribution of scores around the centre point
All scores are at, below, above the mean SD is equally distributed either side of the mean SD is a square root to handle negative distances Mean 650.00 600.00 Nov 04 aRs 550.00 Mean = 555.70 500.00 500.00 600.00 700.00 Nov 05 L4 aRs
Find the mean (x bar) subtract the mean from each observed value square the result add up all squares (this is the 'sigma' in the formula) Divide by number of cases less one (n-1) Find square root TADAAA! SD
A little more than 2/3 scores fall within +/- 1 SD of mean 95% fall within +/- 2 SD 99% fall within +/- 3 SD
Most statistical procedures demand variables fit NORMAL DISTRIBUTION Location of Mean is SKEWNESS Shape of height is KURTOSIS Data vary from normal conditions; e.g., Mean is very sensitive to extreme values Need to detect and resolve VERY TECHNICAL ADVICE AT http://www.rsc.org/images/brief6_tcm18-25948.pdf
Outliers are cases beyond the hinges. These extreme cases are problematic Statistics 650.00 L3Nov05aRs Nov 05 L3 aRs N Valid Missing 82 Nov 05 L3 aRs 1 600.00 Std. Deviation Variance Minimum Maximum 48.83282 2384.644 470.00 691.00 550.00 500.00
Normality detection Check kurtosis (height) & skewness (location of centre) (+/-3.0 no problem)+in some cases as high as 7.00 is ok Outlier detection Check boxplot displays for people with extreme values per variable Possibly remove or adjust using a trimming technique Winsorise: 90% Winsorised mean sets the bottom 5% to the 5th percentile, the top 5% to the 95th percentile, and then averages the data.
>Analyze>Descriptive Statistics>Frequencies Select all variables starting Student Move them into Variables list Select Statistics and switch on Select Charts and switch on Click OK
Note effect of missing on analysis Note strange idea of MEAN for SEX, hence mode is better Note variables are normal except for last one
Consider 1. Are there enough in each group to make inferences about population? 2. Are the groups similar enough so that they can be compared to each other? Consider:
Reality may not exactly fit Normal Distribution assumptions but this example is OK 10 8 Count 6 Statistics 4 L3Nov05aRs Nov 05 L3 aRs N Valid Missing 82 2 1 Skewness Std. Error of Skewness Kurtosis Std. Error of Kurtosis .335 .266 -.519 .526 500.00 550.00 600.00 650.00 Nov 05 L3 aRs
Consider: 1. The High peak means the variable is not NORMAL and the Kurtosis is high because there are too many of the one category compared to the other category. 2. But we can use the data anyway. Consider: 3. Scale of distribution between categories can be evaluated with chi-square test
Evaluates whether distributions vary by more than chance Looks at difference between observed and expected and determines probability that such a difference occurs by chance (when p>.05 we conclude difference is due to chance) Very sensitive to situations when N>100 Group Female Male (=chitest) which produces probability value that difference in distribution occurs by chance observed expected Can be done in MS Excel 59 41 50 50 chi-square 0.07 Inference: the distribution by sex does not differ by more than chance from what was expected
How different do means have to be to be real differences? Statistical significance: greater than chance Practical significance: large enough to care about
Chance eliminated by statistical inferential tests of difference of means t-test: difference of means adjusted by degrees of freedom; if p<.05, then means differ by more than chance F-test: difference of means adjusted by variance in scores within each group (BETTER); if p<.05, then means differ by more than chance
Observed Value1 (+/- 2 std errors) minus Observed Value2 (+/- 2 std errors) Compare result to critical values of that result appearing by chance given number of cases good for small sample n<30 Uses Compare sub-group to overall mean Compare two different groups Prone to interpretive error if multiple tests conducted
Expected M = 555 (average for Y7) Observed M = 566 One-Sample Statistics Std. Error Mean N Mean Std. Deviation L3Nov05aRs Nov 05 L3 aRs 82 566.4390 48.83282 5.39268 One-Sample Test Test Value = 555 95% Confidence Interval of the Difference Mean Difference t df Sig. (2-tailed) Lower Upper L3Nov05aRs Nov 05 L3 aRs 2.121 81 .037 11.43902 .7093 22.1688
Ratio of variance within a set of scores to variation between two different scores (Mean Square Between Groups) (Mean Square Within Groups) Compare to critical value taking into account number of cells between groups and number of cases within groups More robust than multiple t tests
Comparison of Years 78 Report L3Nov05aRs Nov 05 L3 aRs Year 7.00 8.00 Total Mean N Std. Deviation 575.5161 560.9216 566.4390 31 51 82 54.45418 44.74186 48.83282 ANOVA L3Nov05aRs Nov 05 L3 aRs Sum of Squares df Mean Square F Sig. Between Groups Within Groups Total 4106.767 189049.4 193156.2 1 4106.767 2363.118 1.738 .191 80 81
When N is large, small differences will be statistically significant. So not very informative. Statistical significance requires calculation software which you might not have. So not very convenient. A simple comparison that you can do on paper or with handheld calculator Cohen s effect size
d = 1.0 average students receiving treatment would exceed 84% of students not receiving that treatment. large, blatantly obvious, grossly perceptible difference between mean IQ of PhD graduates and high school students d = .31 not perceptible to the naked observational eye approximately equivalent to the difference between the height of a 5'11" and a 6'0" person.
The difference between two means divided by their spread (usually SD); Cohen s d (Mgroup1 Mgroup2)/((SD1+SD2)/2) Group Female 514 Male SD d Reading Writing 512 472 100 .42 Mathematics 505 508 100 -.03 478 100 .36 Note. If N are not equal, then need to take into account the group size.
Effects need to be medium with moderate N to be statistically significant Effect size and inferential statistics lead to similar conclusions Need to report both inferential and practical significance when comparing means
Main effect gets the same result for the treatment Interaction from a treatment is different depending on membership in some other category Cooperative Learning Competitive Learning Males Females Main effect: everyone Conditions are Cooperative and Competitive Learning 20 Interaction: result CooperativeLearning Competitive Learning S c o r e s 15 10 5 Girls Boys 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
A correlation is a measure of degree to which two variables behave in a similar manner; it s the linear relationship Positive = both go up together Negative = one goes up, the other goes down Zero = means no meaningful pattern in relationship of variables Variables are RARELY perfectly correlated in social science small correlations are normal
A measure of how similar 2 or more traits are in how they increase & decrease As human height increases there is a strong tendency for weight to increase As human age increases there is a strong tendency for weight and height to increase As human age increases there is a strong tendency for personal wealth to increase As human age increases there is a strong tendency for more books to be published
Possible to assume correlation equals causation My weight & income are correlated my increasing weight is causing my income to increase (get fat makes you rich!) Possible to assume a correlation equals real connection rather than chance My age and number of books published are positively correlated But this is purely coincidental
Square of correlation = percentage of variance explained Interpretation Correlation Small sizes Medium Large Variance explained up to 4% likely to be chance unless very large sample ranges .01 to .20 .20 to .50 .50 to .99 4--25% 25--98%
What is relationship of school socio- economic status and student sex or ethnicity? Theory: richer schools are associated with more girls and fewer non-Pakeha or Asian students >Analyse>Bivariate Correlations CORRELATIONS /VARIABLES=sc_dec st_gendr st_ethn_rev /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE.
So whats it mean? Hi ethnicity=whites & asians; low ethnicity=Maori & Pacific Islanders Hi SES=rich; Lo SES=poor Pearson r=.39, THUS As ethnicity value gets bigger, school economic status gets bigger and vice versa But which is the chicken and which is the egg?
Regression A directional linear relationship Independent variable causes differences in dependent variable: Y=mx + b Score on variable y is equal to a proportion (m) of score on variable x plus an amount of residual (b) variable Y b intercept X variable
The amount of increase (beta) in a dependent variable as a function of predictor variables Standardise increase as a proportion of standard deviation (standardised beta [ ]) Conceptually similar to an effect size Square of indicates proportion of variance in dependent variable explained by predictor
Interpretations A very small amount of math score is explained by attitude (1.5%) Liking makes math worse; self-efficacy makes math better Interpretations?
Double-Click on the spreadsheet. Change the values and watch how the statistics chance Here are the Excel commands M =average(b2..b6) SD =stdev(b2..b6) Pearson r =pearson(b2..b6,c2..c6) ES (d) =(b7-c7)/((b8+c8)/2) p value for Student s t-test =TTEST(B2:B6,C2:C6,2,1) What do the results mean? Person x y 3 5 7 8 3 2 6 6 4 2 A B C D E M SD Pearson (r) Effect Size (d) t -test 5.20 4.00 2.28 2.00 0.66 0.93 0.21