John Thompson, historian and teacher, takes a closer look at the New Mexuco court decision that imposed a preliminary injunction on the use of value-added measurement to evaluate teachers and to punish or reward them. In his judgment, VAM is on the way out.
New Mexico District Judge David K. Thomson granted a preliminary injunction preventing consequences from being attached to the state’s teacher evaluation data. As Audrey Amrein-Beardsley explains, “can proceed with ‘developing’ and ‘improving’ its teacher evaluation system, but the state is not to make any consequential decisions about New Mexico’s teachers using the data the state collects until the state (and/or others external to the state) can evidence to the court during another trial (set for now, for April) that the system is reliable, valid, fair, uniform, and the like.”
This is wonderful news. As the American Federation of Teachers observes, “Superintendents, principals, parents, students and the teachers have spoken out against a system that is so rife with errors that in some districts as many as 60 percent of evaluations were incorrect. It is telling that the judge characterizes New Mexico’s system as a ‘policy experiment’ and says that it seems to be a ‘Beta test where teachers bear the burden for its uneven and inconsistent application.’”
A close reading of the ruling makes it clear that this case is an even greater victory over the misuse of test-driven accountability than even the jubilant headlines suggest. It shows that Judge Thompson made the right ruling on the key issues for the right reasons, and he seems to be predicting that other judges will be following his legal logic. Litigation over value-added teacher evaluations is being conducted in 14 states, and the legal battleground is shifting to the place where corporate reformers are weakest. No longer are teachers being forced to prove that there is no rational basis for defending the constitutionality of value-added evaluations. Now, the battleground is shifting to the actual implementation of those evaluations and how they violate state laws.
Judge Thomson concludes that the state’s evaluation systems don’t “resemble at all the theory” they were based on. He agreed with the district superintendent who compared it to the Wizard of Oz, where “the guy is behind the curtain and pulling levers and it is loud.” Some may say that the Wizard’s behavior is “understandable,” but that is not the judge’s concern. The Court must determine whether the consequences are assessed by a system that is “objective and uniform.” Clearly, it has been impossible in New Mexico and elsewhere for reformers to meet the requirements they mandated, and that is the legal terrain where VAM proponents must now fight.
The judge thus concludes, “New Mexico’s evaluation system is less like a [sound] model than a cafeteria-style evaluation system where the combination of factors, data, and elements are not easily determined and the variance from school district to school district creates conflicts with the [state] statutory mandate.”
The state of New Mexico counters by citing cases in Florida and Tennessee as precedents. But, Judge Thomson writes that those cases ruled against early challenges based on equal protection or constitutional issues, as they have also cited practical concerns in implementation. He writes of the Florida (Cook) case, “The language in the Cook case could be lifted from the Court findings in this case.” That state’s judge decided “‘The unfairness of this system is not lost on this Court.’” Judge Thomson also argues, “The (Florida) Court in fact seemed to predict the type of legal challenge that could result …‘The individual plaintiffs have a separate remedy to challenge an evaluation on procedural due process grounds if an evaluation is actually used to deprive the teacher of an evaluation right.’”
The question in Florida and Tennessee had been whether there was “a conceivable rational basis” for proceeding with the teacher evaluation policy experiment. Below are some of the more irrational results of those evaluations. The facts in the New Mexico case may be somewhat more absurd than those in other places that have implemented VAMs but, given the inherent flaws in those evaluations, I doubt they are qualitatively worse. In fact, Audrey Amrein-Beardsley testified about a similar outcome in Houston which was as awful as the New Mexico travesties and led to about 1/4th of their teachers subject to those evaluations being subject to “growth plans.”
As has become common across the nation, New Mexico teachers have been evaluated on students who aren’t in the teachers’ classrooms. They have been held accountable for test results from subjects that the teacher didn’t teach. Science teachers might be evaluated on a student taught in 2011, based on how that student scored in 2013.
The judge cited testimony regarding a case where 50% of the teachers rated Minimally Effective had missing data due to reassignment to a wrong group. One year, a district questioned the state’s data, and immediately it saw an unexplained 11% increase in effective teachers. The next year, also without explanation, the state’s original numbers on effectiveness were reduced by 6%.
One teacher taught 160 students but was evaluated on scores of 73 of them and was then placed on a plan for improvement. Because of the need to quantify the effectiveness of teachers in Group B and Group C, who aren’t subject to state End of Instruction tests, there are 63 different tests being used in one district to generate high-stakes data. And, when changing tests to the Common Core PARCC test, the state has to violate scientific protocol, and mix and match test score results in an indefensible manner. Perhaps just as bad, in 2014-15, 76% of teachers were still being evaluated on less than three years of data.
The Albuquerque situation seems exceptionally important because it serves 25% of the state’s students, and it is the type of high-poverty system where value-added evaluations are likely to be most unreliable and invalid. It had 1728 queries about data and 28% of its teachers ranked below the Effective level. The judge noted that if you teach a core subject, you are twice as likely as a French teacher to be judged Ineffective. But, that was not the most shocking statistic. In Albuquerque, Group A elementary teachers (where VAMs play a larger role) are five times more likely to be rated below Effective than their colleagues in Group B. In Roswell, Group B teachers are three times more likely to be rated below Effective than Group C teachers.
Curiously, VAM advocate Tom Kane testified, but he did so in a way the made it unclear whether he saw himself as a witness for the defense or the plaintiffs. When asked about Amrein-Beardsley’s criticism of using tests that weren’t designed for evaluating teachers, Kane countered that the Gates Foundation MET study used random samples and concluded that differing tests could be used in a way that was “useful in evaluating teachers” and valid predictors of student achievement. Kane also replied that he could estimate the state’s error rate “on average,” but he couldn’t estimate error rates for individual teachers. He did not address the judge’s real concern about whether New Mexico’s use of VAMs was uniform and objective.
I am not a lawyer but I have years of experience as a legal historian. Although I have long been disappointed that the legal profession did not condemn value-added evaluations as a violation of our democracy’s fundamental principles, I also knew that the first wave of lawsuits challenging VAMs would face an uphill battle. Using teachers as guinea pigs in a risky experiment, where non-educators imposed their untested opinions on public schools, was always bad policy. Along with their other sins, value-added evaluations would mean collective punishment of some teachers merely for teaching in schools and classes where it is harder to meet dubious test score growth targets. But, many officers of the court might decide that they did not have the grounds to overrule new teacher evaluation laws. They might have to hold their noses while ruling in favor of laws that make a mockery of our tenets of fairness in a constitutional democracy.
During the last few years, rather than force those who would destroy the hard-earned legal rights of teachers to meet the legal standard of “strict scrutiny,” those who would fire teachers without proving that their data was reliable and valid have mostly had to show that their policies were not irrational. Now that their policies are being implemented, reformers must defend the ways that their VAMs are actually being used. Corporate reformers and the Duncan administration were able to coerce almost all of the states into writing laws requiring quantitative components in teacher evaluations. Not surprisingly, it has often proven impossible to implement their schemes in a rational manner.
In theory, corporate reformers could have won if they required the high-stakes use of flawed metrics while maintaining the message discipline that they are famous for. School administrators could have been trained to say that they were merely enforcing the law when they assessed consequences based on metrics. Their job would have been to recite the standard soundbite when firing teachers – saying that their metrics may or may not reflect the actual performance of the teacher in question – but the law required that practice. Life’s not fair, they could have said, and whether or not the individual teacher was being unfairly sacrificed, the administrators who enforced the law were just following orders. It was the will of the lawmakers that the firing of the teachers with the lowest VAMS – regardless of whether the metric reflected actually effectiveness – would make schools more like corporations, so practitioners would have to accept it. But, this is one more case where reformers ignored the real world, did not play out the education policy and legal chess game, and anticipate that rulings such as Judge Thomson’s would soon be coming.
Real world, VAM advocates had to claim that its results represented the actual effectiveness of teachers and that, somehow, their scheme would someday improve schools. This liberated teachers and administrators to fight back in the courts. Moreover, top-down reformers set out to impose the same basic system on every teacher, in every type of class and school, in our diverse nation. When this top-down micromanaging met reality, proponents of test-driven evaluations had to play so many statistical games, create so many made-up metrics, and improvise in so many bizarre ways, that the resulting mess would be legally indefensible.
And, that is why the cases in Florida and Tennessee might soon be seen as the end of the beginning of the nation’s repudiation of value-added evaluations. The New Mexico case, along with the renewal of the federal ESEA and the departure of Arne Duncan, is clearly the beginning of the end. Had VAM proponents objectively briefed attorneys on the strengths and weaknesses of their theories, they could have thought through the inevitable legal process. On the other hand, I doubt that Kane and his fellow economists knew enough about education to be able to anticipate the inevitable, unintended results of their theories on schools. In numerous conversations with VAM true believers, rarely have I met one who seemed to know enough about the nuts and bolts about schools to be able to brief legal advisors, much less anticipate the inevitable results that would eventually have to be defended in court.

In New York, however, the Gov (Cuomo) makes it clear that recommendations to improve education reforms generated by his “Task Force” are intended to implement the current APPR system, which relies heavily on VAM.
According to Cuomo’s statement announcing the results of the Task Force: “The Task Force…recommends that current Common Core aligned tests should not count for students or teachers until the start of 2019-2020 school year to ensure the system is implemented completely and properly to avoid the errors caused by the prior flawed implementation.”
So Cuomo supports a 4-year moratorium on the use of high stakes tests and VAM for evaluation purposes, not to find a suitable replacement, but to fix the current APPR system–in the hope that parents will accept it. Students will continue to take tests during the 4-year moratorium because in four years these tests will again be used in the reconstituted APPR system that rests on VAM.
LikeLike
Let’s hope the Lederman case will result in a decision that will directly challenge the validity of VAM. Even if the ruling applies only to Sheri Lederman, it may serve to help the position of future litigation. Otherwise, I predict Cuomo and company will be spending lots of time in court.
LikeLike
Or, he’s just kicking the can down the road because it’s impossible for him to admit he was wrong and so many influential people backed him on it that it’s now impossible for any of them to admit they were wrong or they lose credibility. Sunk costs. They bet big on VAM and now they have so much invested it’s too late to reconsider. If VAM quietly and slowly disappears no one has to be accountable for the decision to pursue it so aggressively.
Or that.
LikeLike
New Mexico District Judge David K. Thomson made the right move when he granted the preliminary injection the use of the state’s evaluation data. But, the New Mexico Public Education Department headed by Secretary of Education Skandera boldly stated that things are NOT going to change and that things will continue as planned. In other words, business as usually and to hell with the injunction. This has been Skandera’s model from her first day on the job. She listens to nobody, except Martinez, aka Governor and Jeb Bush, her previous employer. The advice from Superintendents, Teachers and other Administrators mean nothing to Skandera. The use of the Florida educational model is all that counts to her.
LikeLike
“the state is not to make any consequential decisions about New Mexico’s teachers using the data the state collects until the state (and/or others external to the state) can evidence to the court during another trial (set for now, for April) that the system is reliable, valid, fair, uniform, and the like.”
Evidence can be and is being manufactured to order all the time. Most if not all of the oligarchs fund their own research and media that will publish reports they fund that will provide results they order. In their war against Global Warming, for instance, the Koch brothers do it all the time through an empire of so-called research foundations that has sprouted like mushrooms.
I’m sure the Walton family does the same thing, and we know that Bill Gates it by having his foundation funding predetermined research results.
LikeLike
Lloyd.
We seem to have reached the point as a country where evidence (indeed reality itself) is whatever billionaires and politicians want it to be.
It’s very bizarre and certainly not a good basis for sound policy.
LikeLike
And there is no escape from the oligarchs became we all live on the same ball. In addition, oligarchs are taking over space travel. The first colonies on the Moon and Mars will probably belong to billionaire and his corporation reminding me of what happened in the Alien film franchise and Avatar where the chase for profit creates horrible nightmares no different than what the Spanish, British and French colonial empires (and a few other European colonial empires) did to native populations around the world in the 16th, 17th, 18th, and 19th centuries—-all in the name of profit and wealth acquisition.
With centuries of history as our teacher, we now know that the profit motive is great for a few and horrible for the many.
For instance, “Historians such as John S. Milloy and Roland Chrisjohn have published documented evidence that discussion of how diseases were spread was concealed by colonialists to conceal actual origins of how indigenous populations were infected with these new diseases.” … Disease did break out among the Indians, but the exact effectiveness of the British attempts at infecting Native Americans is unknown.[15] Letters and journals show that British authorities discussed and agreed to the deliberate distribution of blankets infected with smallpox among Indian tribes in 1763,[16] and an incident involving William Trent and Captain Ecuyer has been regarded as one of the first instances of the use of smallpox as a biological weapon in the history of warfare.[ … Some contemporary Europeans voiced the perspective towards native deaths from contagious disease that it was divine providence; Governor Winthrop of colonial Massachusetts declared, “God hath therefore cleared our title to this place”
Today’s oligarchs are yesterdays colonialist leaders. They are no different.
LikeLike
If John Thompson’s recap of Tom Kane’s testimony is accurate, then it underscores one of the critical advantages of taking VAManiacs to court: the putative experts designing, producing, explaining, defending and selling VAM products have to answer specific questions under oath. In other words, no touting the theoretical and possible and hoped-for and aspirational merits of their wares but pointed queries that probe their actual benefits and drawbacks under real world conditions.
No surprise that, like every other rheephorm “innovation” put to the test, VAM falls so far short of its promised virtues that it reveals itself to be a scam.
Charles Ponzi would be so so proud of his latter-day ideological and moral descendants.
Although perhaps the leading enforcers and enablers and bean counters of self-described “education reform” might be wise to ponder what their font of inspiration once said:
“I went looking for trouble, and I found it.”
Just sayin’…
😎
LikeLike
“The judge thus concludes, ‘New Mexico’s evaluation system is less like a [sound] model than a cafeteria-style evaluation system where the combination of factors, data, and elements are not easily determined and the variance from school district to school district creates conflicts with the [state] statutory mandate.’”
What concerns me is the judges statement right before that sentence:
“As the Court will outline in its findings and conclusions, the PED has shown Value Added Models generally have a sound policy and statistical foundation.”
Vam has a “sound . . . statistical foundation”??? I haven’t read it all yet but how could he come to that conclusion? I’d say the plaintiffs should be looking to debunk that statement, but then again I’m not a lawyer. Seems like a huge loophole/contradiction in the judgement that could be exploited by the defense.
LikeLike
Especially since major statistical organizations openly disagree.
LikeLike
He agreed with the district superintendent who compared it to the Wizard of Oz, where “the guy is behind the curtain and pulling levers and it is loud.”
“The Wizard of Odds”
The “Wizard of Odds’ is VAM
And odds are pretty high
That teacher gets the can
Although her students fly
LikeLike