Archives for category: VAM

Rachel E. Gabriel and Sarah L. Woulfin of the University of Connecticut ask a simple but very important question: Isn’t it time to redesign teacher evaluation? Most states are stuck with laws they wrote to apply for Race to the Top funding. Nearly a decade has passed. We now know that test-based evaluation has failed. Why are so many states and districts holding on to a failed strategy for evaluating teachers? Is it inertia? Apathy?

The model in use is obsolete. It failed. It is time to move on.

“Under RTT, teacher-evaluation policies were designed using economic theories of motivation and compensation and statistical growth tools such as value-added measurement. Evaluation policies based on principles of economics and corporate management have failed to take into account the complex and personalized work of educating students.
While evaluation aims to address teacher performance and quality, what we don’t see is acknowledgement of teacher voice and choice in how policies affect their work. We need to create learning-focused evaluation policies for teachers that enable both students’ and teachers’ growth and align with the needs of schools, students, and communities.

“It’s clear to most educators that the current crop of teacher-evaluation systems is flawed, overwrought, and sometimes just plain broken. Detailed case studies demonstrate that some states now spend millions of dollars on contracts with data-management companies and statistical consulting firms. Many states and districts make similar investments despite the fact that researchers and policymakers question the wisdom of value-added measurement within high-stakes teacher evaluations.

“There is now an entire industry devoted to the evaluation of teaching and the management of student data. There are online professional-development video databases and classroom-walkthrough apps for school leaders—which have not demonstrated a positive effect on instruction. But all of them have inflated the edu-business marketplace…

“A learning-focused teacher-evaluation policy would create the organizational and social conditions teachers need to thrive. During goal-setting with administrators, teachers would work together to write challenging, yet attainable, goals for themselves and their students. They would also have professional-development opportunities to learn about different types of student-progress measurement tools to refine what works best. And in feedback meetings with school leaders, teachers would have space to reflect upon areas of their success and weakness. In turn, principals would devote time and energy to framing evaluation as an opportunity to learn about—rather than judge—teaching.

“To begin the transition toward this kind of evaluation, state and district administrators must shift the balance of resources away from measuring and sorting teachers into categories. School leaders must focus on subject-specific questions about teaching and learning, rather than applying a generic set of indicators. And instead of boiling teachers’ work down to a rating, leaders must share observations that help teachers extend what they do well and identify where they can grow.

“Only when we involve teachers in the process of evaluation policymaking will we come up with a system that supports and develops the teaching expertise students deserve.”

New York and other states continue to be saddled with the toxic gift bestowed (i.e., imposed) as part of Arne Duncan and Barack Obama’s Race to the Top. When New York applied for Race to the Top funding, it agreed to pass a law making test scores a “significant” part of teacher evaluations. It did. The law has been a source of ongoing controversy. It is completely ineffectual–every year, 95-97% of teachers are rated either Highly Effective or Ineffective. Parents rebelled because their children were put into the awkward position of determining their teachers’ ratings, and many objected to the pressure. The result was the Opt Out Movement. Andrew Cuomo was gung-ho for evaluating teachers by test scores, assuming that it would identify the “bad teachers” who should be terminated, and he insisted that test scores should be 50%, no less, in rating teachers. When the Opt Out movement claimed 20% of all eligible students in 3-8, Cuomo appointed a commission to study the issues and asked for a four-year moratorium on use of the scores to evaluate teachers. The moratorium ends next year.

This is an excellent analysis of the mess in New York by Gary Stern, a first-rate reporter for Lohud (Lower Hudson Valley) News.

He writes:

New York state’s teacher evaluation system is a lot like Frankenstein’s monster.

It was a high-minded experiment that turned out ghastly in 2011, scaring the heck out of teachers and their bosses. The monster was repeatedly cut up and sewed back together in search of something better, but just got nastier. Many parents, fearing for the well-being of teachers, rebelled with the educational equivalent of pitchforks and torches: Opting their kids out from state tests.

As a result, a moratorium was put in place in 2015, through the 2019-20 school year, on the most controversial part of the system — the attempted use of standardized test scores to measure the impact of individual teachers on student progress.

The monster was tranquilized, and things quieted down.

Now a bill in Albany, which looks likely to pass, is being hailed by NYSUT and legislators as the answer to putting Frankenstein out of his misery. The bill (A.10475/S.08301) would eliminate the mandatory use of state test scores in teacher (and principal) evaluations, referring to math and ELA tests for grades 3-8 and high school Regents exams. School districts that choose alternative student assessments for use in evaluating teachers would have to do so through collective bargaining with unions.

But the evaluation monster would still live, perhaps in a semi-vegetative state, seemingly hooked up to wires in the basement of the state Education Department.

NYSUT, which represents 600,000 teachers and others, likes this deal. But groups representing school boards and superintendents are antsy. They don’t want teachers unions involved in choosing student assessments. And they say that the bill could lead to more testing, since students will still have to take the 3-8 tests and Regents exams.

Untangling this mess is not for the faint of heart. Even Dr. Frankenstein might look away…

The evaluation system was devised at the height of the “reform” era, when federal and state officials wanted to show that public schools were failing. Gov. Andrew Cuomo prized the evaluation system as a way to drive out crummy teachers. But the whole thing fell flat. As one principal told me, “If I don’t like a teacher, should I root for their students to do poorly on the state tests?”…

As the system is currently stitched together, about half a teacher’s evaluation is based on how students do on various assessments. Most teachers don’t have students who take state tests, so their evaluations are based on a hodgepodge of student measurements. A recent study of 656 district plans across New York, by Joseph Dragone of Capital Region BOCES, found more than 500 different combinations of student assessments in use.

To game the system, more and more districts are applying common measurements of student progress to teachers across grades or schools or even districtwide. Get this: Dragone found that 28 percent of districts use high school Regents exams, in part, to evaluate K-2 teachers.

What’s the value of all this? Primarily, to comply with state requirements for a failed system.

He concludes that no one knows how to fix this mess. It is not enough to stitch up Frankenstein one more time.

But there is an answer.

Repeal the entire system created in response to Race to the Top demands. It failed. Race to the Top failed. Why prop up or revise a failed system?

Let districts decide how to evaluate their teachers. Why does the state need to prescribe teacher evaluation? What does the Legislature know about teacher evaluation? Nothing. Districts don’t want “bad” teachers. Let Arne’s Frankenstein go to its deserved grave.

The Ohio State Senate wants to drop changes in test scores from teacher evaluations. However, the Cleveland district objects because the superintendent clings stubbornly to standardized tests of students as a reasonable measure of teacher quality. The fact that value-added measurement has flopped nationally doesn’t matter to him.

”District CEO Eric Gordon isn’t happy about the change and still wants to use test scores as a major part of teacher ratings. He looks at student scores — particular the “value added” measure of how much students learn in a year — as an important part of gauging whether teachers are doing well or not.”

Maybe no one told him that VAM is a sham.

Kevin Welner of the National Education Policy Center has written a thoughtful (and optimistic) commentary on the Gates Foundation’s latest big bet on reforming education. The new one will invest $1.7 billion in networks of schools in big cities, in the hopes that they can work together to solve common problems.

Welner, K. (2017). Might the New Gates Education Initiative Close Opportunity Gaps? Boulder, CO: National Education Policy Center. Retrieved [date] from

Welner notes that the previous big initiatives of the Gates Foundation failed, although he believes that Gates was too quick to pull the plug on the small schools initiative in 2008, into which he had poured $2 billion. Gates bet another $2 billion on the Common Core, and that was sunk by backlash from right and left and in any case, has made no notable difference. Gates poured untold millions into his plan for teacher evaluation (MET), but it failed because it relied too much on test scores.

Welner says that Bill Gates and the foundation he owns suffer from certain blind spots: First, he believes in free markets and choice, and he ends up pouring hundreds of millions into charters with little to show for it; second, he believes in data, and that belief has been costly without producing better schools; third, he believes in the transformative power of technology, forgetting that technology is only a tool, whose value is determined by how wisely it is used.

Last, Welner worries that Gates does not pay enough attention to the out of school factors that have a far greater impact on student learning that teachers and schools, including poverty and racism. These are the factors that mediate opportunity to learn. Without addressing those factors, none of the others will make much difference.

Welner is cautiously optimistic that the new initiative might pay more attention to opportunity to learn issues than any of Gates’ other investments.

But he notes with concern that Gates continues to fund charters, data, technology, and testing. He continues to believe that somewhere over the rainbow is a magical key to innovation. He continues to believe in standardization.

It seems to me that Kevin Welner bends over backwards to give Gates the benefit of the doubt. With his well-established track record of failure, it is hard to believe he has learned anything. But let’s keep hoping for the best.

Gary Rubinstein knows Michael Johnston from his days in Teach for America. He wishes Mike would stop telling tall tales about the school he briefly ran.

Mike said that the school he ran had a 100% graduation rate and college acceptance rate. Gary points out that 44 seniors graduated and got accepted to college, but there were 73 students in tenth grade two years earlier. That’s a 60% graduation rate, not 100%.

Now Mike Johnston is running for Governor of Colorado. He has built a reputation in the state as an education “reformer.” After graduating from Yale, he taught in Mississippi as a member of Teach for America, earned a degree at the Harvard Graduate School of Education, then a law degree, then was principal of a small school in Colorado where he claimed the school had a graduation rate of 100% and all were accepted into college. Based on this record, he ran for and was elected to the State Senate at the age of 35.

I met Mike Johnston in 2010, when I visited Denver to talk about my then-new book “The Death and Life of the Great American School System: How Testing and Choice Are Undermining Education.” I was scheduled to debate Johnston at a luncheon before about 100 of Denver’s civic leaders. At the very moment I was in Denver, the Legislature was debating Johnston’s legislation to evaluate teachers and principals by the test scores of their students. Johnston called his law, SB 10-191, the “Great Teachers, Great Principals Act.” It required that test scores would count for 50% of every teacher and principal’s evaluation.

On the day we were to debate, Johnston was late. I spoke. Minutes later, Johnston arrived, not having heard anything I said about choice and testing. He spoke with great excitement about how his new legislation would weed out all the bad and ineffective teachers in the state and would lead to a new era of great teachers, great principals, and great schools.

Johnston, as Gary Rubinstein points out, is very much an Obama Democrat. Arne Duncan, whose Race to the Top squandered $5 Billion, has endorsed Johnston’s candidacy for governor.

Seven years later, even Colorado reformers acknowledge that Mike Johnston’s grandiose promises fell flat. In an article in Education Week, Colorado reformer Van Schoales admitted that the punitive SB 10-191 didn’t have much, if any effect.

He wrote:

“Implementation did not live up to the promises.

“Colorado Department of Education data released in February show that the distribution of teacher effectiveness in the state looks much as it did before passage of the bill. Eighty-eight percent of Colorado teachers were rated effective or highly effective, 4 percent were partially effective, 7.8 percent of teachers were not rated, and less than 1 percent were deemed ineffective. In other words, we leveraged everything we could and not only didn’t advance teacher effectiveness, we created a massive bureaucracy and alienated many in the field.

“What happened?

“It was wrong to force everyone in a state to have one ‘best’ evaluation system.”

“First, the data. We built a policy on growth data that only partially existed. The majority of teachers teach in states’ untested subject areas. This meant processes for measuring student growth outside of literacy or math were often thoughtlessly slapped together to meet the new evaluation law. For example, some elementary school art-teacher evaluations were linked to student performance on multiple-choice district art tests, while Spanish-teacher evaluations were tied to how the school did on the state’s math and literacy tests. Even for those who teach the grades and subjects with state tests, some debate remains on how much growth should be weighted for high-stakes decisions on teacher ratings. And we knew that few teachers accepted having their evaluations heavily weighted on student growth.

“Second, there has been little embrace of the state’s new teacher-evaluation system even from administrators frustrated with the former system. There were exceptions, namely the districts of Denver and Harrison, which had far fewer highly effective teachers than elsewhere in the state. Both districts invested time and resources in the development of a system that more accurately reflects a teacher’s impact on student learning. Yet most Colorado districts were forced to create new evaluation systems in alignment with the new law or adopt the state system, and most did the latter. This meant that these districts focused on compliance (and checking off evaluation boxes), rather than using the law to support teacher improvement.

“Third, we continue to have a leadership problem. Research shows that teacher evaluators are still not likely to give direct and honest feedback to teachers. A Brown University study on teacher evaluators in these new systems shows that the evaluators are three times more likely to rate teachers higher than they should be rated. This is a problem of school and district culture, not a fault with the evaluation rubric.

“Fourth, all of Colorado’s 238 charter schools waived out of the system.

“We wanted a new system to help professionalize teaching and address the real disparities in teacher quality. Instead, we got an 18-page state rubric and 345-page user guide for teacher evaluation.

“We didn’t understand how most school systems would respond to these teacher-evaluation laws. We failed to track implementation and didn’t check our assumptions along the way.”

Unfortunately, when the time came to change the law, Sen. Mike Johnston joined with five Republicans on the State Senate Education Committee to defeat a proposal to fix his failed law.

The rejected proposal, “originally introduced with bipartisan sponsorship, would have allowed school districts to drop the use of student academic growth data in teacher evaluations. It also would have eliminated the annual evaluation requirement for effective and highly effective teachers.”

But Johnston preferred to keep his law in place, despite its failure. It remains today as the most regressive teacher evaluation law in the nation. And it has had seven years to demonstrate its ineffectiveness.

Gary Rubinstein calls on Mike Johnston to stop making the false claim in his campaign literature that his high school’s graduation rate was 100%.

I call on him to renounce and denounce SB 10-191.

Make a clean break of it, Mike. Set things right. Show you are man enough to admit you were wrong.

After a long court fight in Houston, the school district agreed not to use value-added scores to evaluate teachers, because it was unable to explain what the algorithms for evaluating teacher performance meant or how they were calculated. The district also agreed to pay the lawyers’ fees for the Texas AFT, which fought the use of VAM.

What is the purpose of unions? To fight for the rights of teachers. No individual teacher (unless married to a lawyer) could have pursued this remedy on his or her own. The union had the resources to protect teachers from an unfair, nonsensical, illegitimate way of evaluating their teaching.

By the way, the courts in Houston were a lot wiser than the courts in Florida, which upheld the practice of evaluating teachers based on the test scores of students they do not teach in subjects they do not teach. The court in Florida said it was “unfair,” but constitutional. How can it be constitutional to have your teaching license depend on the work that others do, in which you have no part at all?

For Immediate Release
October 10, 2017

Zeph Capo

Janet Bass

Federal Suit Settlement: End of Value-Added Measures
for Teacher Termination in Houston

HOUSTON—In a huge victory for the right of teachers to be fairly evaluated, the Houston Independent School District agreed, in a settlement of a federal lawsuit brought by seven Houston teachers and the Houston Federation of Teachers, not to use value-added scores to terminate a teacher as long as the teacher is unable to independently test or challenge the score.

Value-added measures for teacher evaluation, called the Education Value-Added Assessment System, or EVAAS, in Houston, is a statistical method that uses a student’s performance on prior standardized tests to predict academic growth in the current year. This methodology—derided as deeply flawed, unfair and incomprehensible—was used to make decisions about teacher evaluation, bonuses and termination. It uses a secret computer program based on an inexplicable algorithm: = + (Σ∗≤Σ∗∗ × ∗∗∗∗=1)+ .

In May 2014, seven Houston teachers and the Houston Federation of Teachers brought an unprecedented federal lawsuit to end the policy, saying it reduced education to a test score, didn’t help improve teaching or learning, and ruined teachers’ careers when they were incorrectly terminated. Neither HISD nor its contractor allowed teachers access to the data or computer algorithms so that they could test or challenge the legitimacy of the scores, creating a “black box.” In May 2017, the federal district court in Houston issued a decision stating that, “HISD teachers have no meaningful way to ensure correct calculation of their EVAAS scores, and as a result are unfairly subject to mistaken deprivation of constitutionally protected property interests in their jobs.”

HFT President Zeph Capo said: “This victory should mark the end of a destructive era that put tests and a broken evaluation system over making sure our students leave school well prepared for college, career and life. As a practical matter, this ends the use of value-added to terminate teachers in HISD because the district does not have a contractor that is willing or able to meet the constitutional due process standards spelled out by the court.”

Daniel Santos, one of the plaintiffs and an award-winning sixth-grade teacher at Navarro Middle School who was rated ineffective by the flawed EVAAS method, was elated with the settlement.

“I have always been devoted to my students and proud of my teaching skills. Houston needs a well-developed system that properly evaluates teachers, provides good feedback and ensures that educators will receive continuous, targeted professional development to improve their performance,” Santos said.

American Federation of Teachers President Randi Weingarten said the agreement not to use value-added measures for this purpose is the latest nail in the coffin of using tests as a punitive tool. The Every Student Succeeds Act, the federal education law that replaced the No Child Left Behind Act, eliminated the emphasis on test scores.

“Testing and EVAAS don’t measure critical or analytical thinking skills, don’t allow for engaging learning, and certainly don’t improve or create joy in teaching or learning. Instead of value-added methods, let’s value what kids really need: attention to their well-being, engaging and powerful learning, a collaborative school environment, and opportunities for teachers to build their skills throughout their careers,” Weingarten said.

In addition to agreeing to restrict its use of value-added measures, including EVAAS scores, the school district agreed to create an instructional consultation panel—with representatives from the district and the faculty—to discuss and make recommendations on the district’s teacher appraisal process. The settlement also requires HISD to pay Texas AFT $237,000 for attorney’s fees and expenses related to the lawsuit.

Here is the amended summary judgment opinion.

Perhaps our blog poet was thinking of Leonard Cohen’s great song “Anthem” when he wrote this:

“The Fall of the House of Reform”

A crack, a crack in outer wall
The House of Reform, about to fall
A house of test and house of VAM
Of fake “Success” and charter scam
A house of standards built on sand
With Core arranged by Coleman hand
A house infused with sickly air
The flatulence of billionaire
The House was doomed from very start
An empty place without a heart
Expanding crack, lets in the light
As daylight breaks the longest night

It is called VAM. Value-added-measurement, or value-added-modeling. It means measuring the effectiveness of teachers by the rise or fall of the test scores of their students.

Rachel M. Cohen, writing in The American Prospect, documents the slow but steady retreat from evaluating teachers by the test scores of their students. Only a few years ago, VAM was lauded by Secretary of a Education Arne Duncan as the ultimate way to determine which teachers were succeeding and which were failing; Duncan made it a condition of competing for Race to the Top billions, and more than 40 states agreed to adopt it; Bill Gates spent hundreds of millions of dollars promoting it; a team of economists led by Raj Chetty of Harvard claimed that the actions of a teacher in elementary school predicted teen pregnancy, adult earnings, and other momentous life consequences, and earned front-page status in the anew York Times; and thousands of teachers and principals were fired because of it.

But time is the test, and time has not been kind to VAM.

Cohen reviews the role of the courts, with some refusing to get involved, and others agreeing that VAM is arbitrary and capricious. She credits Duncan and Gates for their role in creating this monstrous and invalid way of evaluating teachers. The grand idea, having cut down many good teachers, is nearing its end. But not soon enough.

We now know that William Sanders, the creator of Value-Added Measurement, which measures teacher quality with the methods of an agricultural statistician, is revered in the think tank world of D.C.

But Peter Greene, who teaches in a Pennsylvania high school, doesn’t think much of VAM.

He recounts how Sanders was inspired to think about measuring teachers based on his studies of radioactivity in cows that were downwind from a nuclear explosion (really!).

To which Sanders writes:

Oh, let’s tell the truth. VAM systems have also been limited by the fact that they’re junk, taking bad data from test scores, massaging them through an opaque and improbable mathematical model to arrive at conclusions that are volatile and inconsistent and which a myriad educators have looked at and responded, “Well, this can’t possible be right.”

You’ll never find me arguing against any accountability; taxpayers (and I am one) have the right to know how their money is spent. But Sander’s work ultimately wasted a lot of time and money and produced a system about as effective as checking toad warts under a full moon– worse, because it looked all number and sciencey and so lots of suckers believed in it. Carey can be the apologist crafting it all into a charming and earnest tale, but the bottom line is that VAM has done plenty of damage, and we’d all be better off if Sanders had stuck to his radioactive cows.

The New York Times published a tribute by Kevin Carey of the New America Foundationto William Sanders, “the little-known statistician who taught us to measure teachers.”

One hates to speak ill of the dead, but accuracy requires that we note that Sanders’ statistical model for “measuring” teachers was flawed, inaccurate, and damaged the lives of thousands of teachers based on Sanders’ obscure algorithms. Sanders was an agricultural statistician before he found a goldmine in education. Measuring teacher quality really is not akin to measuring cattle or crops. Every analysis of the influences on students’ test performance gives far more weight to family income and education than to the teachers who see her or him for an hour or five hours a day. Sanders tried to remove human judgment from the equation and ended up creating a profitable business that distorted teaching and learning into a struggle for higher test scores. If the tests themselves are invalid, then any accountability measures based on them will be invalid.

No one knows William Sanders’ works and its flaws better than Audrey Amrein-Beardsley. She has studied Sanders’ value-added measures for years and testified against them in court. She comments on the New York Times’ article here. Amrein-Beardsley points out that Sanders’ methods have been faring poorly in court because it is unfair to judge a teacher based on a mysterious algorithm that no one can understand or explain.

In my book Reign of Error, I wrote about the fallacy behind Sanders’ reasoning by quoting a song from “The Fantasticks.” I paid $1200 for the right to reprint the lyrics. It is the one that goes “Plant a radish, get a radish, not a sauerkraut./That’s why I love vegetables, they know what they’re about.”

No one can say the same about children. Children from the same parents are different, even when their upbringing is as identical as those parents can make it. They look different, they act different, they have different interests, they have different goals.

Sanders never understood that.