Audrey Amrein Beardsley, a national authority on teacher evaluation, reviews here the latest scholarly research on value-added measurement” or VAM. This is the practice of evaluating teachers by changes in their students’ test scores. It was made into a national issue by Race to the Top, which required states to make VAM a significant part of teacher evaluation. In granting waivers to states from the Draconian sanctions of NCLB, Arne Duncan required states to adopt VAM.

The research keeps building against the usefulness of VAM. The latest study concludes that VAM is highly unreliable. Children are not randomly assigned, and teachers face widely varying challenges.

Beardsley writes:

“In a recent paper published in the peer-reviewed journal Education Finance and Policy, coauthors Cassandra Guarino (Indiana University – Bloomington), Mark Reckase (Michigan State University), and Jeffrey Wooldridge (Michigan State University) ask and then answer the following question: “Can Value-Added Measures of Teacher Performance Be Trusted?…

“From the abstract, authors “investigate whether commonly used value-added estimation strategies produce accurate estimates of teacher effects under a variety of scenarios. [They] estimate teacher effects [using] simulated student achievement data sets that mimic plausible types of student grouping and teacher assignment scenarios. [They] find that no one method accurately captures true teacher effects in all scenarios, and the potential for misclassifying teachers as high- or low-performing can be substantial.”

She adds:

“They found…

“No one [value-added] estimator performs well under all plausible circumstances, but some are more robust than others…[some] fare better than expected…[and] some of the most popular methods are neither the most robust nor ideal.” In other words, calculating value-added regardless of the sophistication of the statistical specifications and controls used is messy, and this messiness can seriously throw off the validity of the inferences to be drawn about teachers, even given the fanciest models and methodological approaches we currently have going (i.e., those models and model specifications being advanced via policy).

“[S]ubstantial proportions of teachers can be misclassified as ‘below average’ or ‘above average’ as well as in the bottom and top quintiles of the teacher quality distribution, even in [these] best-case scenarios.” This means that the misclassification errors were are seeing with real-world data, we are also seeing with simulated data. This leads us to more concern about whether VAMs will ever be able to get it right, or in this case, counter the effects of the nonrandom assignment of students to classrooms and teachers to the same.

“Researchers found that “even in the best scenarios and under the simplistic and idealized conditions imposed by [their] data-generating process, the potential for misclassifying above-average teachers as below average or for misidentifying the “worst” or “best” teachers remains nontrivial, particularly if teacher effects are relatively small. Applying the [most] commonly used [value-added approaches] results in misclassification rates that range from at least 7 percent to more than 60 percent, depending upon the estimator and scenario.” So even with a pretty perfect dataset, or a dataset much cleaner than those that come from actual children and their test scores in real schools, misclassification errors can impact teachers upwards of 60% of the time….

“In sum, researchers conclude that while certain VAMs hold more promise than others, they may not be capable of overcoming the many obstacles presented by the non-random assignment of students to teachers (and teachers to classrooms).

“In their own words, “it is clear that every estimator has an Achilles heel (or more than one area of potential weakness)” that can distort teacher-level output in highly consequential ways. Hence, “[t]he degree of error in [VAM] estimates…may make them less trustworthy for the specific purpose of evaluating individual teachers” than we might think.”