Predicting Final Outcome and Call Sequence Length in Longitudinal Surveys

using paradata and previous wave information n.w
1 / 24
Embed
Share

Explore how paradata and previous wave information can help predict final call sequence length and outcome in longitudinal surveys, aiming to improve efficiency in interviewer scheduling and data collection process.

  • Survey
  • Longitudinal
  • Paradata
  • Prediction
  • Call Sequence

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Using Paradata and Previous Wave Information to Predict Final Outcome and Length of Call Sequence in a Longitudinal Survey Olga Maslovskaya, Gabriele Durrant and Peter W.F. Smith University of Southampton

  2. Introduction In face-to-face surveys interviewers make several visits (calls) to households to obtain response (this leads to a call sequence) Here: longitudinal survey Aim: to identify (and ideally avoid) unsuccessful and long call sequences in the current wave of a longitudinal study

  3. Introduction Wide range of information available from current and previous waves for each sample unit, including: Call record data from all previous waves and current wave Interviewers observation variables from current and previous waves Survey data from previous waves Extension of: Durrant, Maslovskaya and Smith (2015) Modelling final outcome and length of call sequence to improve efficiency in interviewer call scheduling, Journal of Survey Statistics and Methodology 3(3): 397-424. (used only wave 1 data)

  4. Main Research Questions Can we predict final call sequence length and final outcome early on in the data collection process of the current wave taking into account information from previous waves? In other words: Can we predict after, for example, the third call if a household is going to respond or not? How many calls is it going to take to reach the final outcome?

  5. Further Research Questions Ability of classical nonresponse models without call data to predict nonresponse is often limited (R2values well below 10%) How predictive are the models which include call record data? Does their ability to predict improve once more call record data are available (e.g. including earlier call information; including information from previous waves in a longitudinal study)? How can these models best be used in adaptive and responsive survey designs?

  6. Data UK Understanding Society Survey (Wave 1 and 2) Large-scale longitudinal study Wave 1: call record data, interviewers observation data, survey data Wave 2: call record data, interviewers observation data Analysis sample: 10,630 households (hh with 4 or more calls) Maximum number of calls: 30 Methodological difficulty: Household IDs are different in different waves (no such thing as longitudinal household) so it is difficult to follow HHs, but possible to follow individuals in households.

  7. Dependent Variables and Models

  8. Modelling Strategy and Explanatory Variables

  9. Assessment of Models Focus on ability of models to predict length and outcome To compare different models and to assess quality of model prediction and model fit Pseudo-R2statistic (proportion of variation in the dependent variable that is explained by the model) Concepts from epidemiology to assess accuracy of models (Plewis et al 2012): o discrimination (sensitivity and specificity) o prediction (positive and negative predicted value) AUC (Area under the Curve) (ROC (receiver operating characteristic) curve)

  10. Assessment of Models

  11. Area under the Curve (AUC) A Receiver Operating Curve (ROC) summarizes predictive power for all possible values c (cut-off for ), by plotting sensitivity as a function of (1- specificity) The higher sensitivity the higher predictive power, given a particular specificity The greater area under the curve (AUC) the greater the predictive power. AUC values range from 1 (perfect discrimination) to 0.5 (no discrimination).

  12. Results

  13. Results

  14. Results

  15. Results

  16. Results

  17. Results

  18. Results

  19. Results

  20. Results: Sensitivity

  21. Results: Positive Predicted Values Model Short Successful (n=5304) 49.9% 50.5% 50.9% 51.8% 53.3% 53.7% 54.8% 62.1% Short Long Long Unsuccessful (n=1400) 0.0% 66.7% 46.7% 39.3% 43.0% 37.9% 48.3% 48.6% Successful (n=2216) 0.0% 37.5% 37.2% 32.6% 38.3% 35.0% 36.8% 42.2% Unsuccessful (n=1710) 0.0% 34.6% 37.2% 36.6% 37.2% 39.1% 40.4% 41.1% 1 2 3 4 5 6 7 8 = k), for k=1,2,3,4 P(? Of the cases predicted to be long unsuccessful 41.1% are predicted correctly = k| ?

  22. Summary Basic geographic information not very predictive Controlling for survey data from W1 increases R squared but not other assessment indicators (classical nonresponse model not very predictive!) Call record data from W1 highly significant and improves prediction (but not by very much) Time of calls and time between calls are all significant variables but their impact on prediction limited Interviewer observation variables: some significant in the models, slightly improve the predictive power

  23. Summary (2) Indicators of change (between W1 and W2): indicators if changes in interviewers observations significant and improve prediction (but not by very much) Variable household has changed household composition highly significant (but not very predictive) Big increase in prediction once call outcomes of most recent calls in W2 included! (the more calls the more useful as expected) Modelling length and final outcome jointly improves prediction (variables in length model and in outcome model can be different)

  24. Conclusions Novel is to model sequence length and to model length and outcome jointly and to condition on all current and previous information on a case (including survey data, call data, interviewer observations) Most recent call outcome most predictive: could indicate that the response process depends on current circumstances more than previous history and characteristics Can be implemented into survey practice quite easily , using standard methodology Survey managers may wish to weigh up between the probability of a successful outcome versus sequence length

More Related Content