
Effective Question Wording and Data Analysis
Explore the significance of validity and reliability in research, common mistakes in question design, and examples of non-mutually exclusive and non-exhaustive questions. Learn about factors impacting question quality such as being too long, double-barreled, leading, and unreasonable.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Question wording and data analysis PHC 6716 June 15, 2011 Chris McCarty
Validity and Reliability Most of what we have dealt with so far has to do with reliability Reliability is the extent to which you will get the same result when you repeat a measure several times Validity is the extent to which you are measuring what you think you are measuring For example, using frequency of jogging as a measure of exercise is not valid because there are many other forms of exercise Much of question wording is about validity
Not mutually exclusive What is your income? 1. 0-$20,000 2. $20,000-$40,000 3. $40,000-$60,000 4. $60,000-$80,000 5. $80,000-$100,000 6. $100,000+
Not exhaustive Where do you get most of your medical advice? 1. My doctor 2. TV 3. Friends 4. Family members
Too long and wordy The next questions ask about YOUR OWN health care. Please DO NOT include care you got when you stayed overnight in a hospital or the times you went for dental care visits. For the purposes of this survey a A PERSONAL DOCTOR OR NURSE is the health provider who knows you best. This can be a general doctor, a specialist doctor, a nurse practitioner, or a physician assistant. When you were enrolled in this program or at any time since then, did you get a NEW personal doctor or nurse? 1. Yes 2. No
Double-barreled Please rate your satisfaction with the amount and kind of care you received while you were in the hospital. 1. Very satisfied 2. Satisfied 3. Neither satisfied or dissatisfied 4. Dissatisfied 5. Very dissatisfied
Leading Most doctors believe that exercise is good for you. Do you 1. Strongly agree 2. Agree 3. Neither agree or disagree 4. Disagree 5. Strongly disagree
Unreasonable How many times in the past year have you eaten out? ________
Too many categories to choose from (will often choose first or last) Please describe the first page of the web site. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. On an average day, how many cigarettes do you (or did you) smoke? 11. How soon after you wake do you smoke your first cigarette? 12. QUITPLAN has the tools to help you learn to quit 13. Other, specify______________________________ QuitPlan QuitNet Quote from member We're helping Minnesotans learn to quit Create your own QuitPlan Ask Questions of Expert Counselors Get support from the QuitNet community Learn from science-based Quitting Guides How much lifetime and money has the Nicodemon stolen from you!
Smoking question Unreasonable for Interviewer Can you describe what happens in this advertisement? 11 Girl says, together we can help to stop the tobacco industry and to save the over 400,000 people who die from tobacco use each year 12 Girl says, but SWAT needs your help 13 Boy says, whoever you are 14 Girl says, whatever you are into. 15 Boy says, wherever you go to school ask about SWAT and how you can do your part in the fight against tobacco 16 Girl says, whatever you do today, can save a life tomorrow 17 Boy and girls talk about how students have to join to fight against tobacco 18 SWAT can fight big tobacco. 19 Anyone can join SWAT and fight tobacco companies 20 Tobacco kills people every year. 21 Don't smoke 22 Other (Please specify) INT: DO NOT READ CHOICES 1 They start naming high school clubs and teams that can be joined 2 Boy names the varsity team 3 Girl names the drama club 4 Boy names student government 5 Girl says, but there is only one with the potential to save over 400,000 lives every year 6 Girl says, SWAT 7 Music starts in background, girls says students working against tobacco 8 Boy says, we're athletes 9 Girl says, we're artists 10 Boy says we're leaders and we are committed to giving Florida's youth a voice in the fight against tobacco
Miscellaneous points When repeating surveys be careful of making changes to response categories such that response numbers mean different things in different versions Some questionnaire authoring packages allow you to randomize the order of questions, and response categories (Stewart et al) Alternate questions that are phrased positively and those phrased negatively Sensitive and controversial questions should be phrased so that respondent feels OK about selecting a negative response You should typically offer a Don t Know and Not Available category (Krosnick et al)
Scales A scale is a set of questions designed to measure a concept that cannot be adequately represented with a single question There are many existing and tested scales for health care (e.g. Beck depression)
How to create a scale Begin by getting a group of respondents to free-list questions related to a concept until there are very few new questions Create a questionnaire using those items Give the questionnaire to a sample of respondents Analyze results and remove questions that are overwhelmingly neutral Test the scale again on a new sample of respondents High and low values should represent the spectrum of your concept
Indices Index, like a scale, is a measure derived from a set of questions The value of an index is in comparing values across time Consumer confidence index is compared to values from previous month and to same time a year before Even though questions may not make sense, it is often better to leave an index unchanged for the purposes of comparability
Four levels of measurement Nominal (categorical, qualitative) Ordinal (rank) Interval Ratio
Nominal Data - Defined Data represented by number or letters Data are placeholders for response items numbers have no numerical meaning Response items should be mutually exclusive and exhaustive Typically analyzed with frequencies, crosstabulations and significance tests for crosstabulations such as Chi Square
Nominal - Example What kind of place do you go to most often when you are sick or need advice about your health? 1 Clinic or health center 2 Doctor's office 3 Hospital emergency room 4 Hospital outpatient department 5 Some other place (Specify) -7 Don't go to one place most often -8 Don't know -9 Refused
Ordinal Data - Defined Includes the properties of nominal data Has additional property that numbers have rank order Often analyzed like nominal data using frequencies and crosstabulations There are crosstab significance tests for ranked data (Tau B, Gamma), but I rarely see them Very often they are treated as interval data They do not have the attributes to be treated as interval data Some people feel that if they work to predict that is justification for using them as interval data
Ordinal Data - Example In the last 6 months, not counting times you needed health care right away, how often did you get an appointment for health care as soon as you wanted? 1 Never 2 Sometimes 3 Usually 4 Always
Interval Data - Defined Has all the properties of nominal and ordinal (place-holding, mutually exclusive and exhaustive, rank order) Has the additional quality that the distance between numbers is equal This allows for the calculation of mean and standard deviation Most of the field of statistics is oriented towards data of at least interval level (e.g. ANOVA, regression, t-test, cluster analysis, etc.) This makes it extremely tempting to treat ordinal data as interval There are not a lot of examples of interval data in social science
Interval Data - Example What is the temperature outside in Fahrenheit? _______
Ratio Data - Defined Has all the properties of nominal, ordinal and interval (place-holding, mutually exclusive and exhaustive, rank order, equal distance) Has the additional quality of an absolute zero There are not many statistics that take advantage of ratio data
Ratio Data - Example What is your age in years? _____
Interval versus ordinal Interval data can inadvertently be made ordinal by using bad ranges In the last 6 months (not counting times you went to an emergency room), how many times did you go to a doctor s office or clinic to get care for yourself? You can use midpoint of ranges to make interval 0 1 2 3 4 5 6 None 1 2 3 4 5 to 9 10 or more 5 to 9 becomes 7 10 or more would typically become 10
Open Ended Questions Typically used when you are unsure what the response categories should be Sometimes used to provide text examples to illustrate points Other-Specify is often included as the last of a set of response items to cover unanticipated responses
Open Ended Question Example 1 Does your child have any special health care needs? 1 Yes 2 No -8 Don t know -9 Refused If Yes What is the diagnosis? ____________________________
Open Ended Question Example 2 What kind of place do you go to most often when you are sick or need advice about your health? 1 Clinic or health center 2 Doctor's office 3 Hospital emergency room 4 Hospital outpatient department 5 Some other place (Specify) -7 Don't go to one place most often -8 Don't know -9 Refused
Analysis of Open-Ended Questions Typically researcher reads through all open ended responses and decides if new response categories seem to come up, then recodes open-ended responses to the new categories Some may used text analysis software (e.g. Atlas.ti, MAXQDA, NVivo)
Wordle of open ended responses to alternative race on ten years of CCI (Brener, et al)
Question placement of breakoffs Analysis Underway
Question Banks Pew Research Center http://people-press.org/question-search/ Roper Center http://webapps.ropercenter.uconn.edu/CFIDE/cf/action/ca talog/ Inter-University Consortium for Political and Social Research (ICPSR) http://www.icpsr.umich.edu/icpsrweb/ICPSR/ Odum Institute http://arc.irss.unc.edu/dvn/
Frequency table of nominal variable Respondent's sex Cumulative Cumulative SEX Frequency Percent Frequency Percent 1,MALE 1106 42.64 1106 42.64 2,FEMALE 1488 57.362594 100.00
Frequency table of ordinal variable Current financial condition Cumulative Cumulative CURFIN Frequency Percent Frequency Percent -9,NA 9 0.35 9 0.35 -8,DK 12 0.46 21 0.81 1,BETTER NOW 1053 40.59 1074 41.40 2,SAME 819 31.57 1893 72.98 3,WORSE NOW 701 27.02 2594 100.0
Crosstabulation EMPLOY(Are you employed now) SEX(Respondent's sex) Frequency Percent Row Pct Col Pct 1,MALE 2,FEMALE Total -9,NA 5 5 10 0.19 0.19 0.39 50.00 50.00 0.45 0.34 -8,DK 6 2 8 0.23 0.08 0.31 75.00 25.00 0.54 0.13 1,YES 640 712 1352 24.67 27.45 52.12 47.34 52.66 57.87 47.85 2,NO 455 769 1224 17.54 29.65 47.19 37.17 62.83 41.14 51.68 Total 1106 1488 2594 42.64 57.36 100.00
Significance test for a table Significance test tells you the probability that the relationship you see in the table is due to chance Significance test does NOT tell you whether the relationship is meaningful Chi-square is a commonly used significance test for a table It is very sensitive to the number of cells
Modified crosstabulation EMPLOY(Are you employed now) SEX(Respondent's sex) Frequency Percent Row Pct Col Pct 1,MALE 2,FEMALE Total 1,YES 640 712 1352 24.84 27.64 52.48 47.34 52.66 58.45 48.08 2,NO 455 769 1224 17.66 29.85 47.52 37.17 62.83 41.55 51.92 Total 1095 1481 2576 42.51 57.49 100.00 Frequency Missing = 18 Statistic DF Value Prob Chi-Square 1 27.1563 <.0001 Likelihood Ratio Chi-Square 1 27.2376 <.0001 Continuity Adj. Chi-Square 1 26.7420 <.0001 Mantel-Haenszel Chi-Square 1 27.1458 <.0001 Phi Coefficient 0.1027 Contingency Coefficient 0.1021 Cramer's V 0.1027
Measuring differences between two groups: T-test with insignificant difference Lower CL Upper CL Lower CL Upper CL Variable BLDRO N Mean Mean Mean Std Dev Std Dev Std Dev Std Err PCOUNT 1,OWN 1996 2.4964 2.5556 2.6148 1.3088 1.3494 1.3926 0.0302 PCOUNT 2,RENT 432 2.4348 2.588 2.7411 1.5184 1.6197 1.7355 0.0779 PCOUNT Diff (1-2) -0.178 -0.032 0.1135 1.3629 1.4013 1.4418 0.0744 T-Tests Variable Method Variances DF t Value Pr > |t| PCOUNT Pooled Equal 2426 -0.44 0.6635 PCOUNT Satterthwaite Unequal 567 -0.39 0.6988 Equality of Variances Variable Method Num DF Den DF F Value Pr > F PCOUNT Folded F 431 1995 1.44 <.0001
T-test with significant difference Lower CL Upper CL Lower CL Upper CL Variable SEX N Mean Mean Mean Std Dev Std Dev Std Dev Std Err indexus 1,MALE 1106 92.903 95.242 97.582 38.062 39.648 41.373 1.1922 indexus 2,FEMALE 1488 82.522 84.396 86.27 35.575 36.853 38.227 0.9554 indexus Diff (1-2) 7.8824 10.846 13.81 37.061 38.07 39.135 1.5114 T-Tests Variable Method Variances DF t Value Pr > |t| indexus Pooled Equal 2592 7.18 <.0001 indexus Satterthwaite Unequal 2281 7.10 <.0001 Equality of Variances Variable Method Num DF Den DF F Value Pr > F indexus Folded F 1105 1487 1.16 0.0090
T-test with significant difference Lower CL Upper CL Lower CL Upper CL Variable BLDRO N Mean Mean Mean Std Dev Std Dev Std Dev Std Err indexus 1,OWN 2007 88.335 90.038 91.741 37.734 38.902 40.144 0.8684 indexus 2,RENT 439 81.377 84.912 88.447 35.348 37.687 40.359 1.7987 indexus Diff (1-2) 1.1291 5.1262 9.1233 37.632 38.687 39.803 2.0384 T-Tests Variable Method Variances DF t Value Pr > |t| indexus Pooled Equal 2444 2.51 0.0120 indexus Satterthwaite Unequal 658 2.57 0.0105 Equality of Variances Variable Method Num DF Den DF F Value Pr > F indexus Folded F 2006 438 1.07 0.4071
Means of Persons per household by age group Analysis Variable : PCOUNT Person Count, FL usual residence Broader age group of N respondent Obs N Mean Std Dev Minimum Maximum 18-24 161 159 3.2955975 1.5733278 1.0000000 12.0000000 25-34 276 272 3.1985294 1.5620965 1.0000000 16.0000000 35-44 392 388 3.3479381 1.4924689 1.0000000 12.0000000 45-54 511 507 2.7159763 1.2877506 1.0000000 9.0000000 55-64 479 472 2.1440678 1.0033949 1.0000000 7.0000000 >65 722 715 1.8293706 1.2040915 1.0000000 20.0000000