Social scientists should produce policy-based evidence in their research from the start

February 14, 2024

John List

Professor John List argues in his new paper to consider five vital signs to determine if an idea is scalable for the real world.

By Sarah Steimer

Social scientists aim for their work to change lives, but the labs in which they test their ideas for efficacy do not always parallel the world they must scale to. These ideas frequently experience a “voltage drop”: once scaled outside the lab, the benefit-cost profile depreciates considerably. In a new Nature article, John List, the Kenneth C. Griffin Distinguished Service Professor in Economics, argues for scientists to flip the research model to produce policy-based evidence from the start. This approach, List says, implores scientists to consider up front the necessary conditions to be able to scale their ideas to have important real-world effects.

List’s work on the topic began 10 years ago and resulted in the 2022 best-selling book The Voltage Effect: How to Make Good Ideas Great and Great Ideas Scale, and his latest paper continues his study of what he terms “nearly an economic law”. The voltage effect suggests that the benefit-cost profile will depreciate from the small setting (i.e., when the idea is being tested for efficacy in the lab) to the large (the real world implementation).

“The Nature paper takes this to the policy community,” he says. “It asks questions such as, If you have a policy idea that comes from the petri dish, how confident can you be that it’ll work in the real world at scale? And what types of data or evidence should we generate to be more confident that the idea can scale on its own merits?”

List argues that the best way to obtain high voltage at scale — between 50 to 90 percent of programs lose voltage at scale, he notes — is to implement policy-based evidence, which involves looking at all of the potential problems and constraints that the idea would face at scale.

“The incentives in the social sciences are not set up to create evidence from the beginning on whether the idea will scale,” List says. “The paper is really about backward-inducting: examine all of the major constraints that you're going to face, and bring it back to the initial research design. You can do your efficacy test, but alongside your efficacy test, do a test of scale, and that's what I call creating policy-based evidence.”

The paper proposes five vital signs to determine if an idea is scalable. First, check for false positives: Does your idea have voltage to begin with, or did it look good in the petri dish because of statistical error, human error, or another misstep? Second, is there representativeness of the population: Initial experiments sometimes use a convenient sample or a sample that the researchers think will give the results they can be confident in — but it wouldn’t be replicable in the broader population. Third, is there representativeness of the situation: Will it work with situational features that occur at scale? Fourth, check for spillovers: Are there benefits that affect those who aren’t participating in a program? Finally, are there economies of scale: As the idea progresses to  a larger population with related implementation costs, would there be decreasing returns?

List offers Jonas Salk’s polio vaccine as an example of a scalable idea that checks the five boxes outlined in the paper. Salk initially tested the vaccine on his own children, then successfully on a wider population. To overcome possible resistance to the vaccination uptake (as seen with the COVID vaccine where individuals had to seek it out), the vaccine was offered as part of regular childhood medical check-ups. Then there are the positive spillover effects to vaccines: An inoculated individual won’t pass a virus along to an unvaccinated person. And lastly, the manufacture of the polio vaccine was affordable at scale.

In his paper, List says that adding questions of scalability and generating policy-based evidence requires scientists to use what he calls “Option C thinking.”

“Traditionally in science, we do A/B testing: We put people in the control group in arm A and people in treatment group in arm B. That's our test and it's usually an efficacy test,” List says. “To give our ideas the best shot, we do an A/B test, we report it to the world. But I want every scientist, every researcher, to augment that A/B testing approach with Option C thinking: arm C is where you provide policy-based evidence, where you add real-world elements to make sure that — alongside your A/B test — you have empirical evidence that gives you confidence in your idea at scaler.”

This is a change in thinking, List acknowledges, but it forces both scientists and decision makers to consider the potential flaws of an otherwise good, small-scale idea. Because researchers are incentivized to report important insights, it causes them to often create a petri dish that leads to overly optimistic results. Likewise, decision makers are incentivized to quickly implement such good ideas. Option C thinking, however, adds a policy-based evidence dimension to the equation that can identify potential problems at scale.

List notes that he doesn’t want this paper to be viewed as a blocker (“We don’t want perfection to be the enemy of good.”), but he urges scientists to check as many of the identified five vital signs as possible for the greatest possible effect at scale.

“You don't change the world unless you do it at scale,” List says. “Innovation is crucial, but diffusion is its perfect complement.  In this spirit, there's never been an economics idea that works in a petri dish, and only in a petri dish, that changes the world.”