Archives for category: VAM (value-added modeling)

The test-based teacher evaluation that was a hallmark of the Obama administration’s Race to the Top is slowly sinking into the ocean (or the desert).

 

Not only did New York teacher Sheri Lederman have her rating overturned by a judge who said the state’s evaluation system was “arbitrary and capricious” (it was designed and defended by State Commissioner John King, now Secretary of Education), but Hawaii just eliminated test-based teacher evaluation. Hawaii won a Race to the Top grant and was required by the rules of the competition to adopt a test-based teacher evaluation system. They did, it never worked, it angered teachers, and it is gone.

 

The state Board of Education unanimously approved recommendations Tuesday effectively removing standardized test scores as a requirement in the measurement of teacher performance, according to a press release from the state Department of Education.

 

 

The recommendations, which were subsequently approved by Superintendent Kathryn Matayoshi, will offer more flexibility to incorporate and weigh different components of teacher performance evaluation, although the option to use test scores in performance evaluation remains.

 

 

The recommendations originated from members of a joint committee between the Hawaii State Teachers Association and DOE, established by the most recent collective bargaining agreement in 2013. Vice Chairperson of the BOE Brian De Lima said that since then, the committee has conducted ongoing reviews and improvements to the evaluation system.

 

 

“There was a continuous evolution to make things better so teachers don’t spend all their time involved in the evaluation process, particularly when they’ve already been (rated) highly effective or effective,” De Lima said. “And the teachers being mentored who may need additional work, they’re getting the attention and the support so they stay interested in remaining in the profession — the most important profession.”

 

 

Formerly, teachers in Hawaii were beholden to curriculum and standards developed with little or none of their input by entities HSTA Secretary-Treasurer Amy Perruso described as “corporate philanthropists.” These entities, namely the Smarter Balanced Assessment Consortium, have had sway in setting teacher performance standards, developed testing for those standards and profiting from the system, she said.

 

 

Teaching effectiveness, then, was rated on student understanding of curriculum teachers themselves didn’t develop but were forced by the administration to implement. Performance of teachers was also rated on aggregated test scores of every student participant — the majority of whom individual teachers never had in their own classrooms.

 

 

“The teacher evaluation system served as a control mechanism,” said Perruso, who also teaches social studies at Mililani High School on Oahu. “If you don’t follow the guidelines, you won’t be rated as ‘effective.’ That’s why what happened (Tuesday) was so critical. It gives teachers back a modicum of power. We’re no longer completely held under the thumb of principals because they can’t use test scores against us anymore.”

Reader Alice responds to the court victory of VirginiaSGP, who succeeded by lawsuit in getting the ratings of Virginia teachers released and plans to post them on his Facebook page. VirginiaSGP is Brian Davison, apparently an engineer, who believes that these test-based ratings are true measures of teachers’ worth.

 

Alice comments:

 

“While I obviously cannot psychoanalyze Brian, I recognizer a lot of a STEM ego in Brian’s diatribes. As a recovering STEM- a-phile, I recognize the inability to recognize that not everything that matters can be numerically measured.

 

“Dealing with humans rather than machines, or in my case, neutrons, is very different and more complicated. Neutrons follow the laws of quantum physics. Neutrons make no decisions. They are consistent. Humans follow no laws of the physical world. Humans make decisions every moment of every day and those decisions are based on a myriad of factors that are not limited to whether they have eaten that day or gotten enough sleep. None of those factors can be measured and put into a VAM or SGP model.

 

“Coupled with the STEM ego is a denigration and misunderstanding of social science research. I have done both types of research. STEM research is cleaner. It is elegant and mathematically beautiful. This is the Gates MET study that Brian consistently quotes. Social science research is messy and depends highly on the assumptions made and the model used because all of these focus on a different aspect of humanity. It cannot be anything else and be of any use to educators. But it looks less “rigorous” than STEM research. But those of us in the field know the rigor.”

 

Reader Chiara has a great idea. It will turn professors against VAM:

 

She writes:

 

“I’m wondering if there’s ever been any discussion of ranking teachers by student test scores in higher ed.

 

“It would obviously be more difficult to do, but one could use the tests that are used for graduate schools, right? How much “value” did undergrad professors add?

 

If you back this approach in K-12 why wouldn’t you back it in higher ed?”

 

 

Let us see the VAM scores for Raj Chetty, Jonah Rockoff, John Friedman, Thomas Kane, and all the other professors who endorse VAM. Be sure their ratings are posted in public. And while we are at it, all professors who testified against teacher tenure should give up their own tenure, On principle.

 

Goose and gander.

John Thompson, teacher and historian, writes here about one of the most controversial education issues of our time: mandated systems of test-based teacher evaluation. This was a central aspect of Race to the Top, and it was hated by large numbers of teachers.
Thompson writes:

“The obituaries for the idea that value-added teacher evaluations can improve teaching and learning are pouring in. The most important of those studies, probably, are those that are conducted by well-known proponents of data-driven accountability for individuals.

 

“Before summarizing the meager, possible benefits and the huge potential downsides of value-added evaluations, let’s recall that these incredibly expensive systems were promoted as a way to improve student outcomes by .50 standard deviations (sd) by removing the bottom-ranked teachers! In Washington D.C., for instance, a $65 million grant which kicked off the controversial IMPACT system was supposed to raise test scores by 10% per year! Of course, that raises the question of why pro-IMPACT scholars don’t mention its $140 million budget for just the first five years.

 

As reported by Education Week’s Holly Yettick, a study funded by the Gates Foundation and authored by Morgan Polikoff and Andrew Porter “found no association between value-added results and other widely accepted measures of teaching quality.” Polikoff and Porter applied the Gates Measures of Teaching Quality (MET) methodology to a sample of students in the Gates experiment, and found, “Nor did the study find associations between ‘multiple measure’ ratings, which combine value-added measures with observations and other factors.”

 

“Polikoff, a vocal advocate for corporate reform, acknowledged, “the study’s findings could represent something like the worst-case scenario for the correlation between value-added scores and other measures of instructional quality. … ‘In some places, [value-added measures] and observational scores will be correlated, and in some places they won’t.’”

 

“Before moving on to another study by pro-VAM scholars which calls such a system into question, we should note other studies reviewed by Yettick that help explain why the value-added evaluation experiment was so wrong-headed. Yettick cites two studies in the American Educational Research Journal. First, Noelle A. Paufler and Audrey Amrein-Beardsley which concludes, “elementary school students are not randomly distributed into classrooms. That finding is significant because random distribution of students is a technical assumption underlying some value-added models.” In the second AERJ article, Douglas Harris concludes, “Overall, however, the principals’ ratings and the value-added ratings were only weakly correlated.”

 

“Moreover, Yettick reports that “Brigham Young University researchers, led by assistant professor Scott Condie, drew on reading and math scores from more than 1.3 million students who were 4th and 5th graders in North Carolina schools between 1998 and 2004” and they “found that between 15 percent and 25 percent of teachers were misranked by typical value-added assessments.”

 

“Finally Marianne P. Bitler and her colleagues made a hilarious presentation to The Society for Research on Educational Effectiveness that “teachers’ one-year ‘effects’ on student height were nearly as large as their effects on reading and math. While they found that the reading and math results were more consistent from one year to the next than the height outcomes, they advised caution on using value-added measures to quantify teachers’ impact.”

 

“Bitler’s study should produce belly laughs as she makes the point, “Taken together, our results provide a cautionary tale for the interpretation and use of teacher VAM estimates in practice.” Watching other advocates for test-driven accountability twisting themselves into pretzels in order to avoid confronting the facts about Washington D.C.’s IMPACT should at least prompt grins.

 

“Getting back to the way that pro-VAM researchers are now documenting its flaws, Melinda Adnot, Thomas Dee, Veronica Katz, and James Wyckoff spin their NBER paper as if it doesn’t argue against D.C.’s IMPACT evaluation system. Despite the prepublication public relations effort to soften the blow, their “Teacher Turnover, Teacher Quality, and Student Achievement” admits that the benefits of the teacher turnover incentivized by IMPACT are less than “significant.”

 

“The key results are revealed on page 18 and afterwards. Adnot et.al conclude, “We find that the overall effect of teacher turnover in DCPS conservatively had no effect on achievement.” But they add that “under reasonable assumptions,” it might have increased achievement. (As will be addressed later, I doubt many teachers would accept the assumptions that have to be made in order to claim that IMPACT improved student achievement as reasonable.)

 

 

“The paper’s abstract and opening (most read) pages twist the findings before admitting “To be clear, this paper should not be viewed as an evaluation of IMPACT.” It then characterizes the study as making “an important contribution by examining the effects of teacher turnover under a unique policy regime.”

 

“In fact, the paper notes, “IMPACT targets the exit of low-performing teachers,” and “virtually all lowperforming teacher turnover [prompted by it] is concentrated in high-poverty schools.” That, of course, suggests that an exited teacher with a low value-added might actually be ineffective, or that the teacher was punished for a value-added that might be an inaccurate estimate caused by circumstances beyond his or her control.

 

“Their estimates show that exiting those low value-added teachers improves student achievement in high-poverty schools by .20 sd in math, and that the resulting exit of 46% of low-performing teachers “creates substantial opportunity to improve achievement in the classrooms of low-performing teachers.” The bottom line, however, is: “We estimate that the overall effect of turnover on student achievement in high-poverty schools is 0.084 and 0.052 in reading.” Both estimates may be “statistically distinguishable from zero” but they would only be “significant at the 10 percent level.”

 

“So, why were the total gains so negligible?

 

“The NBER study concludes that IMPACT contributed to the increase in the attrition rate of Highly Effective teachers to 14%. It admits that some high-performing teachers find IMPACT to be “demotivating or stressful” and that the loss of top teachers hurts student performance. It acknowledges, “This negative effect reflects the difficulty of replacing a high-performing teacher.”

 

“The study doesn’t address the biggest elephant in the room – the effect of value-added evaluations on instructional effectiveness on the vast majority of D.C teachers. If high-performing teachers leave because of the “stress and uncertainty of these working conditions,” wouldn’t other teachers be “dissatisfied with IMPACT and the human capital strategies in DCPS writ large?” If the attrition rate of the top teachers in higher-poverty schools increases to 40% more than their counterparts in lower-poverty schools, does that indicate that the harm done by the evaluations is also greater in high-challenge schools? And, the NBER paper finds that “teachers exiting at the end of our study window were noticeably more effective than those exiting after IMPACT’s first year.” Shouldn’t that prompt an investigation as to whether the stress of IMPACT is wearing teachers down?

 

“Adnot, Dee, Katz, and Wyckoff thus continue the tradition of reformers showcasing small gains linked to value-added evaluations and IMPACT-style systems, but brushing aside the harm. On the other hand, they admit that IMPACT had advantages that similar regimes don’t have in many other districts. D.C. had the money to recruit outsiders, and 55% of replacement teachers came from outside of the district. Few other districts have the ability to dispose of teachers as if we are tissue paper.

 

“Even with all of those advantages provided by corporate reformers in D.C. and other districts with the Gates-funded “teacher quality” roll of the dice, an incredible amount of stress has been dumped on educators as they and students became lab rats in an expensive and risky experiment. The reformers’ most unreasonable assumption was that these evaluations would not promote teach-to-the-test instructional malpractice. They further assume that the imposition of a accountability system that is biased against high-challenged schools will not drive too much teaching talent out of the inner city. They never seem to ask whether they would tackle the additional challenges of teaching in a low-performing school when there is a 15 to 25% chance PER YEAR of being misevaluated.

 

“Now that these hurried, top-down mandates are being retrospectively studied, even pro-VAM scholars have found minimal or no benefits, offset by some obvious downsides. I wonder if they will try to tackle the real research question, try to evaluate IMPACT and similar regimes, and thus address the biggest danger they pose. In an effort to exit the bottom 5% or so of teachers, did the test and punish crowd undermine the effectiveness of the vast majority of educators?”

Today, the US Senate voted to confirm John King as Secretary of Education by a vote of 49-40.

 

The only Democrat to vote no was New York Senator Gillibrand.

 

King was opposed by many New York parent groups because of his unwillingness to listen, his unyielding devotion to the Common Core, test-based teacher evaluation, high stakes testing for children, and the corporate reform agenda.

Rick Hess writes about a new study of teacher evaluation systems in 19 states by Matthew Kraft and Allison Gilmour. It shows that the new systems have made little difference. Instead of 99% of trachers rated effective, 97% are rated effective.

 

This was Arne Duncan’s Big Idea. It was an essential element of Race to the Top. The assumption behind it was that if kids got low test scores, their teachers must be ineffective.

 

It failed, despite the hundreds of millions–perhaps billions– devoted to creating these new systems to grade teachers. Think of how that money might have been used to help children and schools directly!

 

Hess writes:

 

“Emboldened by a remarkable confidence in noble intentions and technocratic expertise, advocates have tended to act as if these policies would be self-fulfilling. They can protest this characterization all they want, but one reason we’ve heard so much about pre-K in the past few years is that, as far as many reformers were concerned, the big and interesting fights on teacher evaluation had already been won. They had moved on.

 

“There’s a telling irony here. Back in the 1990s, there was a sense that reforms failed when advocates got bogged down in efforts to change “professional practice” while ignoring the role of policy. Reformers learned the lesson, but they may have learned it too well. While past reformers tried to change educational culture without changing policy, today’s frequently seem intent on changing policy without changing culture. The resulting policies are overmatched by the incentives embedded in professional and political culture, and the fact that most school leaders and district officials are neither inclined nor equipped to translate these policy dictates into practice.

 

“And it’s not like policymakers have helped with any of this by reducing the paper burden associated with harsh evaluations or giving principals tools for dealing with now-embittered teachers. If anything, these evaluation systems have ramped up the paperwork and procedural burdens on school leaders—ultimately encouraging them to go through the motions and undercut the whole point of these systems.”

 

 

 

 

As I posted yesterday, the judge in the New Mexico trial of teacher evaluation based on test scores has been delayed.

 

Audrey Amrein-Beardsley explains the delay here. 

 

The good news is that the preliminary injunction on use of VAM remains in place.

Audrey Amrein-Beardsley and her graduate students analyzed the results of Houstin ISD’s hefty investment in value-added measurement of teachers. Houston spends a cool $500,000 a year to implement VAM.

 

Is is it working?

 

No.

In 2010, I was in Denver the day that the Legislature was debating S. 10-191, a bill sponsored by young Senator Michael Johnston. It was a bill to base 50% of teachers’ evaluation on test scores, a new, untried, and very controversial idea. Teachers were strongly opposed, and the Legislature was deeply divided but the bill passed. I was supposed to debate Johnston at a lunch in downtown Denver, but the debate didn’t work as planned. There were about 60 civic leaders in the room, and we waited patiently for Johnston. We finished lunch and still no Johnston. So I got up and gave my talk and explained why it was wrong to evaluate teachers and principals by test scores (at that time, I was working with Richard Rothstein on a statement against test-based evaluation that was signed by a bevy of testing experts). No sooner did I finish, then presto-change-o, young Senator Johnston strides through the doors in the back of the room. He had carefully managed not to hear anything I said.

 

He then proceeded to talk for 20 minutes or more about the glories of using test scores to judge teachers, principals, and schools. He predicted that the passage of his bill would bring about miraculous improvements in education across the state of Colorado. He praised his legislation as the dawn of a new day. Michael Johnston is an alum of Teach for America (were you surprised to hear that?). The title of his bill was something grandiose and completely fraudulent, something like “Great Schools and Great Teachers Act of 2010.” Gosh, it is six years later, and almost everyone except Michael Johnston knows that test-based accountability flopped. It flopped in Colorado and it flopped everywhere else, despite the billions pumped into by the federal government, the Gates Foundation, states and local districts.

 

Just in the past few days, both John Merrow and the team of Checker Finn and Michael Petrilli independently agreed that teacher evaluation by test scores was Arne Duncan’s worst mistake. John Merrow said, “Tying teacher evaluations to testing was a mistake, probably Arne Duncan’s biggest mistake.” Petrilli and Finn said that the federal mandate for teacher evaluations was “politically poisonous.” But not in Colorado, it seems.

 

A group of legislators proposed revising his bill to eliminate evaluation by test scores, and it appeared to have the support it needed. But at the last minute, two of the Republicans changed their minds about dropping the teacher evaluation by test scores, and Michael Johnston’s failed idea survived by a vote of 6-3. So Johnston and five Republican Senators managed to preserve this program, which has not worked in Colorado nor anywhere else in America. Six years after passage, there is not a whit of evidence that it improves teaching and learning.

 

Do you think Michael Johnston read the statement by the American Statistical Association in 2014 warning that using test scores to evaluate individual teachers is not a reasonable idea, because teachers influence between 1-14% of the variation in student test scores? I don’t think so. Do you think he saw the statement by the American Educational Research Association last fall against the use of this method? I don’t think so. Do you think he read the statement by Edward Haertel, the Stanford University testing expert, on the flaws of value-added assessment? Do you think he knows that it has been dropped by district after district because it costs millions and it has failed everywhere to identify the best or the worst teachers? Apparently not.

 

Michael Johnston doesn’t know what he doesn’t know. With this last-ditch effort to preserve the bad idea he sponsored, he has proved that he neither reads nor thinks.

 

Message to Colorado parents: Opt out. Resist. Do not let the state impose bad policies on your children or their teachers.

The NY BATS. Are not happy about President Obama’s selection of John King as Secretary of Education. Say this for King. His arrogant indifference to parents set off the largest testing opt out in history. Maybe he can do the same for the nation.

BATS write:

“WE GOOFED BUT TRUST US TO FIX IT: Now headed for Senate confirmation hearings, Obama’s Acting Education Secretary John King admits in a new video that standardized testing has been harmful and wasteful, yet will continue federal tinkering to find a better balance between subjecting kids to non-stop testing hell and collecting data to improve instruction.

Reading stiffly from cue cards, King continues his “apology tour” after alienating teachers with corporate reform policies straight out of ALEC’s basement. Yet the Secretary continues to pretend outraged teacher and parent groups do not see right through to the heart of the problem – the corporate revolving door and the influence of money in politics.

Obama had always mailed in his education policy, straight from the boardrooms of Center for American Progress, the Gates Foundation and social engineers like Joanna Weiss. The policies were also favored by Wall Street and billionaires like the Waltons and Broads, yet were met with whimpers by the heads of the large teacher unions.

This untested market-based approach to changing schools exploded in opt-outs and gave Republicans an issue with great traction. Now Obama is backpedaling, but only in rhetoric as his actions only cement his commitment to upending classrooms through continuous, invasive measuring. His promises to help underperforming schools remain broken, as support for addressing actual learning obstacles flows instead into the hands of testing contractors and armies of consultants.

In essence, Obama is saying to America “yes we goofed” but let’s have a “fresh start”, beginning with the nomination of King, a darling of privatizers and dark money PACs that rain campaign cash onto your state legislators. This is not only tone-deaf and a thumb in the eye, it’s doubling down on corporate reform and federal centralization.

As a short-lived teacher and charter network director, King lacked the experience the education community was looking for, so his PR handlers instead launched an all-points media blitz based on his personal narrative, which credited NYC public schools for changing the trajectory of his life. Strange then, that he would pick a career in charter schools, which require pro-active completion of lottery applications, thereby leaving behind the most needy children whose parents are not as involved.

Today, the hope of students, parents and teachers across the political spectrum is that local control of schools can be restored by downsizing almost everything the megalithic USDOE does, abandoning NCLB’s federally mandated test requirements and concentrating on supporting the research-based recommendations of actual educators instead of mandating ham-handed “fixes” after meeting with lobbyists.

In short, Obama’s record on education is widely considered even “worse than Bush”, but the way forward now is no longer manufacturing fake crises and endlessly patching up failed (and unconstitutional) federal testing policies, it’s folding up shop and giving tax dollars back to districts so teachers can teach.”