Education Next is a conservative journal that can be counted on to support education reform in all its manifestations.

However, today it is releasing a new study finding that the most ineffective way to rate teacher education programs is by the test scores of students taught by their graduates. As we have often said, VAM (value-added measurement), beloved by Arne Duncan, is a sham. The now discredited rule was promulgated by the Obama administration.

Ranking teacher-prep programs on value-added is ineffective

New analysis finds program rankings based on graduates’ value-added scores are largely random

Last year Congress repealed a federal rule that would have required states to rank teacher-preparation programs according to their graduates’ impact on student test scores. Yet twenty-one states and D.C. still choose to rank programs in this way. Can student test performance reliably identify more and less effective teacher-preparation programs? In a new article for Education Next, Paul T. von Hippel of the University of Texas at Austin and Laura Bellows of Duke University find that the answer is usually no.

Differences between programs too small to matter. Von Hippel and Bellows find that the differences between teachers from different preparation programs are typically too small to matter. Having a teacher from a good program rather than an average program will, on average, raise a student’s test scores by 1 percentile point or less.

Program rankings largely random. The errors that states make in estimating differences between programs are often larger than the differences states are trying to estimate. Program rankings are so noisy and error-prone that in many cases states might as well rank programs at random.

High chance of false positives. Even when a program appears to stand out from the pack, in most cases it will be a “false positive”—an ordinary program whose ranking is much higher (or lower) than it deserves. Some states do have one or two programs that are truly extraordinary, but published rankings do a poor job of distinguishing these “true positives” from the false ones.

Consistent results across six states. Using statistical best practices, von Hippel and Bellows found consistent results across six different locations—Texas, Florida, Louisiana, Missouri, Washington State, and New York City. In every location the true differences between most programs were miniscule, and program rankings consisted mostly of noise. This was true even in states where previous evaluations had suggested larger differences.

When measured in terms of teacher value-added, “the differences between [teacher-preparation] programs are typically too small to matter. And they’re practically impossible to estimate with any reliability,” say von Hippel and Bellows. They consider other ways to monitor program quality and conclude that most are not ready for prime time. But they do endorse reporting the share of a program’s graduates who become teachers and persist in the profession—especially in high-need subjects and high-need schools.

To receive a copy of “Rating Teacher-Preparation Programs: Can value-added make useful distinctions?” please contact Jackie Kerstetter at The article will be available Tuesday, May 8 on and will appear in the Summer 2018 issue of Education Next, available in print on May 24, 2018.

About the Authors: Paul T. von Hippel is an associate professor at the University of Texas at Austin and Laura Bellows is a doctoral student in public policy at Duke University.