Kevin Welner, director of the National Education Policy Center, wrote this commentary in response to the complaints of teachers who are evaluated by the scores of students they never taught. Few people can understand the complex algorithms underlying VAM scores, and the people who wrote these formulae can’t explain them in pain English. Yet teachers are fired or get a bonus if their incomprehensible rating is low or high. Bear in mind that few, if any, states would have adopted these measures without the financial and political pressure exerted by Arne Duncan, Race to the Top, and the Obama administration, which demanded them.

Welner writes:

“As you probably know, Diane, my biggest concerns about high-stakes accountability systems tied to measures of academic growth aren’t technical—they’re about perverse incentives. Yes, the technical problems are very real, but even if they were all somehow overcome, we’d be left with a much poorer system of education that’s narrowly focused on what’s being measured.

“Having said that, I do want to add to your earlier post concerning the Florida VAM. I think the post makes three good points but overlooks the most important one.

“As you point out, the model is nonsense when applied to educators who don’t teach the tested subjects. And as you point out, application of the model results in misclassifications—as do all such models. Finally, as you point out, very few readers can understand the model.

“But that leads to a somewhat different point that I think is very important. Florida’s legislators, its Commission of Education, and the members of the State Board of Education almost surely are among those who cannot understand the model. My hunch is that the AIR experts who developed FL’s model have walked through it, possibly multiple times, with these policy makers. But the math is just too complex. (Note that the excerpt you pasted from page 6 of the AIR report is just the general form of the model; if expanded it would be much more overwhelming—see the next 10 pages of the report.)

“This is not a criticism of the model or its developers; simple regression models that could be relatively easily understood have well-documented flaws. But adding vectors capturing the effect of lagged scores, mathematical descriptions of Bayesian estimates, and within-student covariance matrices—while all justified in the report—has the obvious effect of placing policy makers at the mercy of whichever experts they choose to listen to.

“This sort of problem does come up in other contexts; to some extent it’s unavoidable. When Congress votes to fund a NASA mission, the underlying math, physics and engineering are similarly beyond normal understanding. When judges hear expert testimony in a pharmaceutical case, etc., they also must confront their own limitations. But at least in those instances, there’s a procedure in place to take oppositional testimony.

“The best analogy here is probably to the defense industry, which works with people in the defense department to design a new weapons system and then helps to market it to Congress. The result is often something technically sophisticated and, for most members of Congress, well beyond their ability to understand strengths and weaknesses.

“Perhaps that’s why the non-technical evidence is so important. We can all understand the problem when a teacher explains that her evaluation is based on the academic growth of students in areas she doesn’t teach. We can all also, to some extent, understand the problem of unreliable evaluations that result in misclassifications.

“But we should, at the very least, recognize and acknowledge the reality that these policies are being adopted by policy makers who pretty much have no clue what it is that they’re putting in place.”