Widening Participation and Impact Evaluation in the Context of COVID-19

why evaluate lunchtime conference programme n.w
1 / 21
Embed
Share

Explore the challenges and opportunities of impact evaluation in various sectors with a focus on rethinking randomized control trials, limitations, concerns, and cost considerations. Join the discussion on developing effective evaluation programs amidst the pandemic.

  • Widening Participation
  • Impact Evaluation
  • COVID-19
  • RCTs
  • Challenges

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Why Evaluate? Lunchtime Conference Programme: Widening Participation, Evaluation and COVID 19 Welcome to today s session. Please keep your microphones muted. There will be a 20-30 minute presentation or panel discussion. We will then have 25 minutes for questions and discussion. Broadening approaches to impact evaluation We may record the presentation / panel discussion, but we will turn recording off for the open discussion. CHRIS FOX Please contribute to the discussion or ask your questions via Zoom s chat function (at the bottom of your screen). Please address your chat to everyone. The meeting host will read out your question / comment. #whyevaluate @VilliersPark

  2. Speaker 2 The Policy Evaluation and Research Unit (PERU) Professor Chris Fox A multi-disciplinary team of evaluators, criminologists and economists Professor of Evaluation and Policy Analysis and Director of PERU Working across criminal justice, youth wellbeing, education and welfare reform www.mmuperu.co.uk @MMUPolicyEval

  3. Overview 3 Re-thinking Randomised Control Trials Mediators and moderators Implementation studies and multi-method RCTs Realist RCTs Developing impact evaluation programmes that address questions about scaling and roll-out Small n impact evaluation for interventions unsuitable for traditional, large n counterfactual impact evaluations

  4. Randomised Controlled Trials RCTs work by dividing a population into two or more groups by random lot, giving one intervention to one group, the other to another, and measuring the pre-specified outcome for each group. (Haynes et al. 2012: 9) Units in the study s target population can be individuals, institutions or areas Diagram from Government Social Research Unit 2007: Figure 7.2

  5. Concerns with RCTs Policy utility Failure to open the black box and explain why intervention worked Difficult to generalise because internal validity often comes at expense of external validity Methodological RCTs often do not collect or do not look for unintended consequences of a policy or programme RCTs generally provide average impact estimates and these may hide significant variation Ethical Individuals allocated to control group are discriminated against because they are not receiving an intervention given to the treatment group. Counter argument is that until RCT is completed we do not know that intervention is beneficial. Cost Often argued to be relatively expensive and time-consuming although there are lots of examples of inexpensive RCTs (see for example BIT evaluation of fine repayments in Haynes et al. 2012)

  6. Concerns with RCTs Policy utility Failure to open the black box and explain why intervention worked Difficult to generalise because internal validity often comes at expense of external validity Methodological RCTs often do not collect or do not look for unintended consequences of a policy or programme RCTs generally provide average impact estimates and these may hide significant variation Ethical Individuals allocated to control group are discriminated against because they are not receiving an intervention given to the treatment group. Counter argument is that until RCT is completed we do not know that intervention is beneficial. Cost Often argued to be relatively expensive and time-consuming although there are lots of examples of inexpensive RCTs (see for example BIT evaluation of fine repayments in Haynes et al. 2012) Post-positivism Scientific Realism Constructivism

  7. RCTs: causal description and causal explanation 7 The unique strength of experimentation is in describing the consequences attributable to deliberately varying treatments. We call this causal description. In contrast, experiments do less well in clarifying mechanisms through which, and the conditions under which, that causal relationship holds what we call causal explanation (Shadish, Cook, & Campbell, 2002, page 9)

  8. Complex social interventions 8 Causal explanation is important - different forms of evidence necessary to understand complex interventions In addition to RCT findings, Medical Research Council guidance points to three areas of focus for evaluations (Craig et al., 2008) Implementation and process evaluation (fidelity) Causal processes (mediators or mechanisms) Contextual factors (moderators) Important for generalizing (external validity) and for portability of results

  9. Understanding moderators and mediators 9 One option: incorporate mediator & moderator analysis into trials, but: Sample sizes in many RCTs render moderator (context) analyses underpowered Assumptions required for the valid analysis of mediators (mechanisms/processes) are often highly restrictive Alternative strategy: integrate mixed methods and qualitative research into RCTs to explain causal processes or mechanisms e.g. Health services research: process evaluations have been widely integrated into randomized studies (e.g Medical Research Council Guidance: Moore et al., 2015). Education: formal methods of implementation process evaluation (IPE) have been developed by EEF (Humphrey et al., 2016).

  10. Realist RCTs 10 'Realists' argue that RCTs misunderstand the scientific method, offer only a 'successionist' approach to causation, which brackets out the complexity of social causation, and fail to ask which interventions work, for whom and under what circumstances (Pawson and Tilley 1994, 1997). Bonell et al (2012) argue RCTs are useful in evaluating social interventions because randomized control groups actually take proper account of rather than bracket out the complexity of social causation. Realist RCTs should be oriented towards building and validating 'mid-level' program theories which set out how interventions interact with context to produce outcomes. They could include: analysing how pathway variables mediate intervention effects using multiple trials across different contexts to test how intervention effects vary drawing on complementary qualitative and quantitative data; Thus, Realist RCTs determine the validity of program theory and 'what works'. Bonell et al. (2012) Realist randomised controlled trials: a new approach to evaluating complex public health interventions

  11. A realist RCT 11 Jamal et al. (2015) The three stages of building and testing mid-level theories in a realist RCT: a theoretical and methodological case-example

  12. 12 EMMIE Effect Mechanisms/mediators Moderators/contexts Implementation Economic costs (Johnson et al. 2015)

  13. EEF 13 Pilot studies: conducted in a small number of sites (e.g., 3 or more), where a programme is at an early or exploratory stage of development. Efficacy trials: test whether an intervention can work under developer-led conditions in a number of sites, usually 50+. Effectiveness trials: test a scalable model of an intervention under everyday conditions (where the developer has limited input) in a large number of sites, usually 100+ across at least three different geographical regions. Scale-up: when a programme which has been shown to work when rigorously trialled, and has the capacity to deliver at scale, is expanded to work across a bigger area delivering to a large number of sites. a quantitative impact evaluation to assess the impact of the intervention on outcomes. An implementation and process evaluation to identify the challenges for delivery. An indicative cost of the intervention also calculated. evaluated through qualitative research to develop and refine the approach and test its feasibility in relevant settings. Initial, indicative data will be collected to assess its potential to raise attainment. a quantitative impact evaluation and an implementation and process evaluation as per efficacy trial. The cost of intervention at this scale calculated. impact evaluation continues but, this is now a lighter touch process.

  14. 14 Developing evaluation programmes NESTA evidence standards

  15. Blueprints for healthy youth development 15 Promising programmes 1 RCT or 2 QED evaluations using valid and reliable measures, analysis based on intention to treat , baseline equivalence of intervention and control groups established, etc. Evidence of consistent and statistically significant positive impact in a preponderance of studies that meet the evaluation quality criteria. Model programmes Promising programmes + 2 well conducted RCTs or 1 RCT and 1 QED evaluation Promising programmes + one long-term follow-up (12 months +) Model Plus programmes Model programmes + at least 1 high-quality study demonstrating desired outcomes, authorship, data collection, and analysis conducted by a researcher who is neither a current or past member of the developer s research team and who has no financial interest in the program. https://www.blueprintsprograms.org

  16. What is small n? 16 What is small n ?: [W]hen data are available for only one or a few units of assignment, with the result that experiments or quasi- experiments in which tests of statistical differences in outcomes between treatment and comparison groups are not possible. (White and Phillips 2012: 6) Goes beyond traditional understandings of case- studies (Stern et al. 2012) Cases may be policy interventions, institutions, individuals, events or even countries during a particular historical period. 07/06/2025

  17. Some key assumptions 17 Causation:A switch from discussing attribution in traditional impact evaluation to what is termed contribution , recognising the importance of supporting factors in understanding impact in more complex settings (Stern et al., 2012, Fox and Morris 2019). Quants and quals: Breaking down the distinction between quantitative and qualitative methods. Mixed methods : Different designs may share similar methods and techniques: both experiments and case studies may use interview data, draw on focus groups and analyse statistical data. What holds them together is the fundamental logic of a design not their methods. (Stern et al. 2012: 15) Note: Opens up possibility that many of difficulties with mixed methods are at the level of mixed designs, i.e. underlying assumptions rather than at a more technical level. (ibid) 07/06/2025

  18. Different approaches 18 Case-based methods can be broadly typologised as either: between case comparisons eg Qualitative Comparative Analysis or; within case analysis eg Process Tracing General Elimination Methodology (Scriven) Contribution Analysis (Mayne) 07/06/2025

  19. Qualitative Comparative Analysis 19 A method that changes qualitative data into Boolean logic, bridging qualitative and quantitative analysis and providing a powerful tool for the analysis of causal complexity (Ragin 1987 & 2008). Addresses complexity and the influence of context by looking for patterns across multiple cases Based on two assumptions: that change is often the result of different combinations of factors, rather than on any one individual factor; and that different combinations of factors can produce similar changes (Ragin 1984). Typically used with intermediate number of cases (eg 10 50) where conventional statistical analysis not possible Some where outcome of interest present, some where it isn t No counterfactual in the traditional sense Requires in-depth knowledge of cases Normally starts with theory of change from which conditions (factors) the presence or absence of which may contribute to outcomes are defined. Qualitative data for each factor converted into a score and analysed In crisp set QCA this is always 0 or 1

  20. Process Tracing 20 Tests causal links between putative causes and outcomes by identifying the intervening causal processes or mechanisms at work. Starts with theory of change Evaluator generates (preferably competing) hypotheses about how intervention may connect to outcome and what should be observed if each hypothesis is true or false. In-depth case-study analysis, drawing largely on qualitative data is used to develop chronology of events, setting out causal links between each stage. Evidence is then used to assess hypotheses. Four tests to assess the strength of alternative hypotheses (Bennett 2010): Straw in the wind tests, provide evidence for or against a hypothesis but by themselves cannot confirm or deny it Hoop tests, if passed, can affirm the relevance of a hypothesis but cannot fully confirm it and, if failed, eliminate a hypothesis Smoking Gun tests, can confirm a hypothesis if passed, or weaken it if failed Doubly Decisive Tests in confirming a given hypothesis eliminate any others. 1. 2. 3. 4. 07/06/2025

  21. Conclusions 21 Debates about the use of experiments (RCTs) and similar designs (quasi-experiments) are moving beyond simple, binary choices. Developing thinking on mediators and moderators, mixed method designs and Realist RCTs suggest that different methodologies are more flexible than simple, binary debates sometimes suggest. We should put more of our energy into developing impact evaluation programmes that address questions about intervention scaling and roll-out. This requires more dialogue between researchers, research funders and research consumers (particularly policy makers and practitioners) There are alternative approaches to impact evaluation for interventions where traditional, large n counterfactual impact evaluations are not suitable and/or the impact questions that need answering include questions about mechanisms, contexts and contributions to delivering outcomes. 07/06/2025

Related


More Related Content