January 28th, 2009 | Posted in PRB News
by Bill Butz, president and CEO
Fifteen years ago, the U.S. Congress, in bipartisan consort with Vice President Gore, passed the Government Performance and Results Act. All federal agencies had to announce their quantitative program goals ahead of each budget year and document their successes and failures before the following year. The law’s objective was to give more money to agencies that were succeeding and less to agencies that fail.
Around five years ago, first at RAND and then here at PRB, I started to notice the impulse at the federal agencies and especially the private foundations that funded our work to pay closer attention to whether their dollars were doing any good. The program officers monitoring our projects found themselves on the business end of tough questions: Who is benefiting from the projects you are funding? How efficient are your projects at getting the benefits to them? And toughest and most important of all, How would the world be different if your projects didn’t exist?
Fifteen years ago, almost no one asked this toughest question, There was no way to answer the counterfactual. How in the world could someone actually attribute a change out there in the real, complicated world—say, an increase in use of contraceptives—to a particular intervention—say, operations of a family planning program? Might not the increase be due, instead or in addition, to more schooling or jobs for women or to modernization from TV exposure? By now, that excuse has lost its punch. What is the difference between then and now? It’s 15 years of experience applying the Gold Standard of intervention evaluation—randomized clinical trials—to field interventions. That is, taking the design principles for evaluating medical treatments administered to persons, and adapting these principles to evaluating programmatic interventions aimed at large numbers of persons. This exercise has worked out quite well.
Now the Population Program at the William and Flora Hewlett Foundation is giving PRB the opportunity to see just how far the randomized clinical trial model can be extended. Over the next year, this model will be our starting point in designing evaluation methodologies for all our programs—from face-to-face training to our World Population Data Sheet, and from blogs (like this one) to large international conferences. For each type of program, our evaluation protocol will get as close to the Gold Standard as is methodologically and practically possible.
Has enough knowledge and experience accumulated to extend this Gold Standard to the wide range of programs PRB and other organizations like us conduct? By adapting the methodology as far as is reasonable and practical, my colleagues and I aim to provide initial answers. A lot of help toward our answers will come from “When Will We Ever Learn? Improving Lives through Impact Evaluation,” “From the Great Society to Continuous Improvement Government: The Hopes of an Impatient Incrementalist,”and “Improving Effectiveness and Outcomes for the Poor”.
Believe me, your comments and suggestions now as we get going are more than welcome!
Here, adapted from standard clinical trial protocols, are the 10 characteristics of an ideal intervention evaluation that we will take as our Gold Standard:
1. The intervention inputs, processes, and outputs, as well as the expected impacts, are well defined in concept and measurement before the intervention starts and not subject to substantial variation during the intervention-evaluation period.
Without this first characteristic, a successful evaluation is unlikely.
2. The intervention is consistent with available theory and evidence linking the intervention causally to its desired impacts.
3. The intervention has been developed and pretested in small preliminary settings, to ensure language comprehension, cultural acceptability, and other factors essential to a sound intervention.
4. In addition to those who experience the intervention (the” treatment group”), there are others included in the design who do not experience the intervention (the “control group”). The two groups are alike in all characteristics that matter in determining the impacts, except in their exposure to the treatment.
5. In addition to the treatment and control groups, there is also a placebo group, whose members get whatever level of special attention is received by the treatment group, with the exception of the essential treatment posited to make an impact.
Without the fifth characteristic, it is not possible to know whether measured impacts are due to the intervention or to a “Hawthorne effect” (an improvement in outcomes produced by the psychological stimulus of being singled out and made to feel important, independent of the particular change introduced. When this occurs, the improved outcomes will disappear when the treatment stops being special).
6. Assignment to these groups is random
Without the fourth and sixth characteristics, the evaluation must rely completely on statistical techniques to control for the influence of factors other than the treatment. Unfortunately, no existing data or techniques can do this completely reliably. Without these two characteristics, the most serious flaw is “selectivity bias,” arising when people who are particularly qualified or motivated select themselves or are selected by others to participate in the intervention.
7. Data on baseline, process, outputs, and impacts are collected and compared across the groups. These data include information about characteristics of the participants/subjects that might not have been completely randomized. The seventh characteristic facilitates correction for any departures from random assignment to the groups and yields information about the process, information that can help establish the causal links between treatment and impacts.
8. The evaluation phase is planned to last as long as required to observe an impact, based on available theory and evidence.
Without the eighth characteristic, estimates of the impact will be biased and wrong. True impacts may not be evident for months or sometimes years after the intervention or may require interventions that persist over time
9. Costs of the intervention—out of pocket costs and opportunity costs, if any, of the persons producing and experiencing the intervention—are measured.
Without the ninth characteristic, the cost-effectiveness of the intervention cannot be compared usefully with other successful interventions, for deciding which to replicate or scale up.
10. Results of the evaluation are related to other evaluations and communicated effectively to appropriate practitioners, researchers, and policy makers.
Without the tenth characteristic, the evaluation loses most of its potential value.