Psychology 3301 Lecture 4 - Research Methods & Data Analysis
In this lecture, explore essential topics in psychology research methods, philosophy of science, data analysis, and more. Understand the importance of accurate data analysis and its impact on conveying meaningful insights. Delve into qualitative and quantitative data analysis, loss of information, and the significance of conveying accurate meaning in data interpretation.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Psychology 3301 Lecture 4
Recall Research? Methods? ? Philosophy? Of? Science? Research? Design? Statistics? Measurement? Data? Analysis? No? random? assignment? of? S s? to? conditions? Random? assignment? of? S s? to? conditions? Qualitative? Data? Quantitative? Data? Hypothesis? Testing? Estimation? ? Positivism? ? Popper? ? Sampling? Methods? Non-probability? Probability? ? Experiment? ? All? sorts? of? stuff? about? confounding,? blinding,? placebo? effects,? matching,? blocking,? etc.? *? ? ? Reduce? ? Summarize? ? Convey? meaning? ? Loss? functions? ? ? Decision? procedures? ? Point? hypotheses? ? Distribution? theory? ? Test? statistics? ? Theory? of? errors? ? No? room? but? we? don t? teach? much? of? this? stuff? anyway.? ? Operationism? ? CV? Theory? ? Kuhn? SRS? SYS? PPS? STR? Cluster? ? Content? ? Narrative? ? Discourse? ? Framework? ? Grounded? Theory? ? ? Univariate? ? Realism? ? T? constructs? ? Description? vs? explanation? vs? law? vs? prediction? vs? model? vs? theory? ? Determinism? vs? mechanism? vs? reductionism? vs? humanism? shape? location? spread? kurtosis? ? Reliability? ? Validity? ? CTT? ? IRT? ? ? Descriptive? Study? Case? study? Survey? Naturalistic? Interview? ? Quasi-Experiment? Pre-post? Pre-post-control? Time? series? Time? series? control? ? Other? Cohort? study? Guided? interview? Retrospective? Prospective? ? ? Bivariate? linear? correlation? non-lin? correl? ? Multivariate? MDS? PCA? FA? ? Metric? vs? Non-metric? ? Model? fitting? ? *? All? designed? to? attune? students? to? the? ways? in? which? our? methods? of? design? can? cause? our? research? to? mislead? us? about? the? way? in? which? the? world? really? is.? The? major? reason? for? research? methods? in? the? first? place.? ? The? reason? we? must? pay? attention? to? these? issues? is? that? most? of? these? principles? are? nothing? but? hard? learned? lessons? about? the? way? in? which? we? deceive? ourselves? by? the? methods? we? use? to? try? to? understand? the? world.? Although? they? are? listed? under? experiments,? these? are? principles? that? apply? to? ALL? methods? of? data? collection? and? analysis.?
Data Analysis There are two kinds of data: i) Qualitative ii) Quantitative or better .numerical The purpose of data analysis is the same, irrespective of the type of data. The purpose of data analysis is: i) Data reduction ii) Summarization iii) Convey meaning about the raw data A good data analyst does with a minimal loss of the information contained in the raw data. But, it s important to understand that because data analysis involves reduction, there is always a loss of information involved!
Loss of Information Consider that you are asked by a friend What did Donald Trump say in his last rally? Imagine you answer The usual, just a bunch of stuff about how wonderful he is. Although you may not know it, you have engaged in a form of data analysis. How so? First, you have reduced the raw data. That is, the words, tone/inflection/etc., and gestures/facial expressions/etc., of Trump s recent rally speech. Secondly, you have given a summary of the raw data. Finally, you have conveyed meaning about the raw data.
Loss of Information The questions of real importance are: i) Have you conveyed meaning accurately? ii) How much did you leave out? Good data analysis requires that your summary is an accurate representation of the original so that the meaning of the original is not lost and that your portrayal of the original is as complete as possible. To the extent that you fail to achieve these two goals, your data analysis can be said to be biased, misleading, insufficient, etc.
Loss of Information: Problem How do we determine the correspondence between the analytic result and the raw data if we do not have access to the raw data? From a scientific point of view, we can not merely trust that a correspondence exists and so it must be demonstrated or known in some way whether or not a correspondence does exist. Technically, the difference between the raw data and the representation of the raw data by a data analyst is known as the loss function . It is incumbent on any data analyst to do the best they can to state their loss function. Believe us is not good enough.
Loss Function: Example One Imagine we calculate the mean income for formula 1 race car drivers. That mean is roughly 9.25 million per year but see below. Formula 1 Driver Salaries 5 Frequency 4 3 2 1 0 1 6 11 16 21 26 31 36 Annual salary in millions Only 5 of the 20 drivers make more than the mean. The misrepresentation is high, so the loss function is also high. The median is 5 million and so has a smaller loss function.
Loss Function: Example One Why does this matter so much? Let s say we use the mean in the following sentence Formula 1 drivers are richer than NHL hockey players because their average salary is 9.25 million a year whereas as NHL hockey players average only 3 million per year. This sentence implies, because of the way it is phrased, that all Formula 1 drivers are rich and that 9.25 million per year is a reasonable representation of how rich they are. It also implies that all formula 1 drivers are richer than all NHL players because the average annual salaries are 3 times higher for Formula 1 drivers than NHL players. VERY IMPORTANT: As an analyst it is not exactly what you say, people hear the implications of what you say! Be careful not to invite your listener to hear something that is not true.
Loss Function: Example One But if we look again at the graph of Formula 1 driver salaries here Formula 1 Driver Salaries 5 4 3 2 1 0 1 6 11 16 21 26 31 36 3 million per year We see that 8 of the 20 or 40% of Formula 1 drivers make less than the average salary of an NHL player!
Loss Function: Example One So what s the loss function in this case then? By calculating a statistic that represents a feature of the group the mean we have lost information about individuals within the group. Group statistics systematically ignore individual differences. And, as we know, features that are characteristic of a group are not necessarily characteristic of individuals within the group. In fact, it is possible that not a single individual in the group has the same value as the average of the group! We see this in our second example.
Loss Function: Example Two Imagine we calculate the mean for the following two sets of data 45 12 40 35 10 30 8 25 6 20 15 4 10 2 5 0 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 In both cases, the mean is 4. So, if we only report the mean, the meaning we convey is that the two groups of subjects are very similar. But of course, they are dramatically different.
Loss Functions: The Take Away The point is not to remove loss. There will always be loss of information in any data analysis. The point is to understand what the loss is and convey what information has been lost to the consumer of the analysis. So, if we report a group statistic as a representation of the state of individuals in the group, we should report how many individuals in the group are not well represented by that statistic. We should report some measure of dissimilarity of individuals within the group!
The Three Types of Data Analysis Essentially, we have methods of representation that are univariate, bivariate and multivariate. We will learn about all three types in this course. In previous courses you will have seen univariate (mean, median, standard deviation, etc) statistics and bivariate (person s r, scatter plots, etc) statistics but probably not multivariate methods of data analysis. In this course we will learn methods of data representation that represent features of the simultaneous relationships between multiple variables. For example, Principal Components Analysis and Multi- Dimensional Scaling.
The Three Types of Data Analysis: A Complication There are representational methods of data analysis. The methods represent features of the raw data. These are things like medians, means , standard deviation, Pearson correlation, covariance matrix, etc. There are also techniques that model features of the raw data. These techniques fall in the category of model fitting. These are techniques such as simple linear regression and multiple regression. So, for instance . 10 8 6 Y 4 2 0 0 2 4 X 6 8
Review of Univariate and Bivariate Statistics I m going to leave the power-point approach here and ask you to watch a series of mini video lectures on univariate and bivariate statistics. Much of this will be review for many of you but in the words of Jack Nicklaus, the greatest golfer of all time, when he visited his life-long teacher Jack Grout at the start of every season, he said Jack, teach me how to play golf Even for the best in the world, it s important to go back to the fundamentals. A sign of maturity and sophistication is not the complexity of what one is capable of doing, it s the depth of understanding one has about the fundamentals.
Review of Univariate and Bivariate Statistics As Bruce Lee said.. I fear not the man who has practiced 10,00 kicks once, but I fear the man who has practiced one kick 10,000 times. Remember that, it s more important than you might think. OK, so here is the list of review type videos to watch i) ii) A video on the shape of univariate distributions: https://youtu.be/hLwYUyRPqmM iii) A video about the mean: https://youtu.be/lyiW9zhB5xA iv) A video about variability: https://youtu.be/Uj_Q4fmJpws v) A video about the difference between what measures of location represent and what measures of variability represent:https://youtu.be/LR5u-ZPEeNk A video on types of variables: https://youtu.be/14Zf9sHmvkw
Review Videos to Watch Continued Review videos continued VI. A video on conditional distributions:https://youtu.be/_qRChyz86OE VII.A video on the covariance: https://youtu.be/qN2cim3UR9Q VIII. A video about the Pearson r: https://youtu.be/mFCWME0FZ1w
Assignment Two In this assignment, let s use the annual data file we used in the last exercise. This kind of data file is called a dimensional data file. Relational data files like SQL databases that underpin most websites contain data in a number of different dimensional tables that are linked with an identifier. These data files must be converted to dimensional data files in order to be easily analyzed. This data file would be considered very clean (having few errors) except for the large number of missing values. Many subjects (years) do not contain data for many of the variables. In later weeks, we will look more closely at the integrity of data files and learn about auditing, data cleaning, recoding, and dealing with missing values.
Assignment Two For now, we will just analyze the data we have and ignore issues relating to the quality of the data and what to do about missing values. The instructions are as follows and a video on how to do this assignment is here: i) Load the data file here into SPSS. Save the file in SPSS, you will be using it later. ii) Use the recode command to create two groups: 1990-2003 and 2004-2017. Call the variable period . Assign values of 0 to 1990- 2003 and values of 1 to 2004-2017. iii) Use the SPSS compute command to add the US state and federal inmate populations to get a total inmate population size. Finally, calculate the number of US inmates per million using the US population variable. Call this variable inmatespercapita .
Assignment Two Continued iv) Calculate histograms, means, medians, and AAD s for both periods for each of the following variables: i) Inmatespercapita ii) US life expectancy iii) Co2 emissions per capita in China iv) Co2 emissions per capita in the US v) Co2 emissions per capita in Canada vi) Electric power consumption per capita in Nigeria vii) Electric power consumption per capita in Canada viii) Annual per capita income in the US ix) Annual per capita income in China x) Annual per capita income in Nigeria xi) Annual per capita income in Canada
Assignment Two Continued v) Produce a scatter-plot and Pearson r for each of the relationships between: i) Inmatespercapita and US life expectancy between 1960 and 2017. ii) Co2 emissions per capita in Canada and Income per capita in Canada between 1960 and 2017. iii) Co2 emissions per capita and electric power consumption per capita in Canada between 1960 and 2017. iv) Co2 emissions per capita in Canada and Co2 emissions in Nigeria between 1960 and 2017. v) Female suicide rate per 100,00 in Canada and income per capita in Canada.
What to Hand In Go to Assignments in the main menu on Blackboard. Go to Assignment 2. Create a pdf of your assignment and upload it. The pdf should contain 11 pages: Page 1: Title page with name, student number and assignment name Page 2: The SPSS batch commands you used to compute and recode your data. Pages 3 - 6: Numerical answers to question 4 state. Pages 7-10: Numerical answers to questions 5 Page 11: A commentary about what you think the data analyses you have done show about the world in which we live. Marks will be assigned as follows: Page 2 - 1, pages 3-6 - 3, pages 7-10 3, page 11 - 3. Total 10.