John Thompson, teacher and historian, writes here about one of the most controversial education issues of our time: mandated systems of test-based teacher evaluation. This was a central aspect of Race to the Top, and it was hated by large numbers of teachers.
Thompson writes:

“The obituaries for the idea that value-added teacher evaluations can improve teaching and learning are pouring in. The most important of those studies, probably, are those that are conducted by well-known proponents of data-driven accountability for individuals.


“Before summarizing the meager, possible benefits and the huge potential downsides of value-added evaluations, let’s recall that these incredibly expensive systems were promoted as a way to improve student outcomes by .50 standard deviations (sd) by removing the bottom-ranked teachers! In Washington D.C., for instance, a $65 million grant which kicked off the controversial IMPACT system was supposed to raise test scores by 10% per year! Of course, that raises the question of why pro-IMPACT scholars don’t mention its $140 million budget for just the first five years.


As reported by Education Week’s Holly Yettick, a study funded by the Gates Foundation and authored by Morgan Polikoff and Andrew Porter “found no association between value-added results and other widely accepted measures of teaching quality.” Polikoff and Porter applied the Gates Measures of Teaching Quality (MET) methodology to a sample of students in the Gates experiment, and found, “Nor did the study find associations between ‘multiple measure’ ratings, which combine value-added measures with observations and other factors.”


“Polikoff, a vocal advocate for corporate reform, acknowledged, “the study’s findings could represent something like the worst-case scenario for the correlation between value-added scores and other measures of instructional quality. … ‘In some places, [value-added measures] and observational scores will be correlated, and in some places they won’t.’”


“Before moving on to another study by pro-VAM scholars which calls such a system into question, we should note other studies reviewed by Yettick that help explain why the value-added evaluation experiment was so wrong-headed. Yettick cites two studies in the American Educational Research Journal. First, Noelle A. Paufler and Audrey Amrein-Beardsley which concludes, “elementary school students are not randomly distributed into classrooms. That finding is significant because random distribution of students is a technical assumption underlying some value-added models.” In the second AERJ article, Douglas Harris concludes, “Overall, however, the principals’ ratings and the value-added ratings were only weakly correlated.”


“Moreover, Yettick reports that “Brigham Young University researchers, led by assistant professor Scott Condie, drew on reading and math scores from more than 1.3 million students who were 4th and 5th graders in North Carolina schools between 1998 and 2004” and they “found that between 15 percent and 25 percent of teachers were misranked by typical value-added assessments.”


“Finally Marianne P. Bitler and her colleagues made a hilarious presentation to The Society for Research on Educational Effectiveness that “teachers’ one-year ‘effects’ on student height were nearly as large as their effects on reading and math. While they found that the reading and math results were more consistent from one year to the next than the height outcomes, they advised caution on using value-added measures to quantify teachers’ impact.”


“Bitler’s study should produce belly laughs as she makes the point, “Taken together, our results provide a cautionary tale for the interpretation and use of teacher VAM estimates in practice.” Watching other advocates for test-driven accountability twisting themselves into pretzels in order to avoid confronting the facts about Washington D.C.’s IMPACT should at least prompt grins.


“Getting back to the way that pro-VAM researchers are now documenting its flaws, Melinda Adnot, Thomas Dee, Veronica Katz, and James Wyckoff spin their NBER paper as if it doesn’t argue against D.C.’s IMPACT evaluation system. Despite the prepublication public relations effort to soften the blow, their “Teacher Turnover, Teacher Quality, and Student Achievement” admits that the benefits of the teacher turnover incentivized by IMPACT are less than “significant.”


“The key results are revealed on page 18 and afterwards. Adnot conclude, “We find that the overall effect of teacher turnover in DCPS conservatively had no effect on achievement.” But they add that “under reasonable assumptions,” it might have increased achievement. (As will be addressed later, I doubt many teachers would accept the assumptions that have to be made in order to claim that IMPACT improved student achievement as reasonable.)



“The paper’s abstract and opening (most read) pages twist the findings before admitting “To be clear, this paper should not be viewed as an evaluation of IMPACT.” It then characterizes the study as making “an important contribution by examining the effects of teacher turnover under a unique policy regime.”


“In fact, the paper notes, “IMPACT targets the exit of low-performing teachers,” and “virtually all lowperforming teacher turnover [prompted by it] is concentrated in high-poverty schools.” That, of course, suggests that an exited teacher with a low value-added might actually be ineffective, or that the teacher was punished for a value-added that might be an inaccurate estimate caused by circumstances beyond his or her control.


“Their estimates show that exiting those low value-added teachers improves student achievement in high-poverty schools by .20 sd in math, and that the resulting exit of 46% of low-performing teachers “creates substantial opportunity to improve achievement in the classrooms of low-performing teachers.” The bottom line, however, is: “We estimate that the overall effect of turnover on student achievement in high-poverty schools is 0.084 and 0.052 in reading.” Both estimates may be “statistically distinguishable from zero” but they would only be “significant at the 10 percent level.”


“So, why were the total gains so negligible?


“The NBER study concludes that IMPACT contributed to the increase in the attrition rate of Highly Effective teachers to 14%. It admits that some high-performing teachers find IMPACT to be “demotivating or stressful” and that the loss of top teachers hurts student performance. It acknowledges, “This negative effect reflects the difficulty of replacing a high-performing teacher.”


“The study doesn’t address the biggest elephant in the room – the effect of value-added evaluations on instructional effectiveness on the vast majority of D.C teachers. If high-performing teachers leave because of the “stress and uncertainty of these working conditions,” wouldn’t other teachers be “dissatisfied with IMPACT and the human capital strategies in DCPS writ large?” If the attrition rate of the top teachers in higher-poverty schools increases to 40% more than their counterparts in lower-poverty schools, does that indicate that the harm done by the evaluations is also greater in high-challenge schools? And, the NBER paper finds that “teachers exiting at the end of our study window were noticeably more effective than those exiting after IMPACT’s first year.” Shouldn’t that prompt an investigation as to whether the stress of IMPACT is wearing teachers down?


“Adnot, Dee, Katz, and Wyckoff thus continue the tradition of reformers showcasing small gains linked to value-added evaluations and IMPACT-style systems, but brushing aside the harm. On the other hand, they admit that IMPACT had advantages that similar regimes don’t have in many other districts. D.C. had the money to recruit outsiders, and 55% of replacement teachers came from outside of the district. Few other districts have the ability to dispose of teachers as if we are tissue paper.


“Even with all of those advantages provided by corporate reformers in D.C. and other districts with the Gates-funded “teacher quality” roll of the dice, an incredible amount of stress has been dumped on educators as they and students became lab rats in an expensive and risky experiment. The reformers’ most unreasonable assumption was that these evaluations would not promote teach-to-the-test instructional malpractice. They further assume that the imposition of a accountability system that is biased against high-challenged schools will not drive too much teaching talent out of the inner city. They never seem to ask whether they would tackle the additional challenges of teaching in a low-performing school when there is a 15 to 25% chance PER YEAR of being misevaluated.


“Now that these hurried, top-down mandates are being retrospectively studied, even pro-VAM scholars have found minimal or no benefits, offset by some obvious downsides. I wonder if they will try to tackle the real research question, try to evaluate IMPACT and similar regimes, and thus address the biggest danger they pose. In an effort to exit the bottom 5% or so of teachers, did the test and punish crowd undermine the effectiveness of the vast majority of educators?”