Since the passage of No Child Left Behind, test scores have been defined by federal law as the goal of education. Schools and teachers that “produce”higher scores are good, schools and teachers that don’t are “bad,” and likely to suffer termination. The assumption is that higher test scores produce better life outcomes, and that is that.

In late 2016, Jay P. Greene produced a short and brilliant paper that challenged that assumption. I have fallen into the habit of asking myself whether the young people who are super-stars in many non-academic fields had high scores and guessing they did not. Fortunately, it is only in schools where students get branded with numbers like Jean Val Jean of “Les Miserables.” Outside school, they can dazzle the world as athletes, musicians, inventors, or mechanics, without a brand.

Greene writes:

“If increasing test scores is a good indicator of improving later life outcomes, we should see roughly the same direction and magnitude in changes of scores and later outcomes in most rigorously identified studies. We do not. I’m not saying we never see a connection between changing test scores and changing later life outcomes (e.g. Chetty, et al); I’m just saying that we do not regularly see that relationship. For an indicator to be reliable, it should yield accurate predictions nearly all, or at least most, of the time.

“To illustrate the un-reliability of test score changes, I’m going to focus on rigorously identified research on school choice programs where we have later life outcomes. We could find plenty of examples of disconnect from other policy interventions, such as pre-school programs, but I am focusing on school choice because I know this literature best. The fact that we can find a disconnect between test score changes and later life outcomes in any literature, let alone in several, should undermine our confidence in test scores as a reliable indicator.

“I should also emphasize that by looking at rigorous research I am rigging things in favor of test scores. If we explored the most common use of test scores — examining the level of proficiency — there are no credible researchers who believe that is a reliable indicator of school or program quality. Even measures of growth in test scores or VAM are not rigorously identified indicators of school or program quality as they do not reveal what the growth would have been in the absence of that school or program. So, I think almost every credible researcher would agree that the vast majority of ways in which test scores are used by policymakers, regulators, portfolio managers, foundation officials, and other policy elites cannot be reliable indicators of the ability of schools or programs to improve later life outcomes.”

I would add that Chetty et al did not establish a causal relationship between teacher VAM and later life outcomes, only a correlation. The claim that my fourth grade teacher “caused” me not to become pregnant a decade later strains credulity. At least mine.

Greene’s essay includes an excellent reading list of studies showing high test scores but no change in high school graduation rate or college attendance.

The Milwaukee and D.C. voucher studies that show a gain in high school graduation rate should note the high attrition rate from these programs, which inflates the graduation rate.

Imagine saying to a governor, I have a policy intervention that will raise test scores but will have little or no effect on life outcomes. Would they jump at the offer? Based on the political activity of the past 15 years, the answer is yes.

Overall, however, a seminal essay from a prominent pro-choice scholar.