Archives for category: VAM (value-added modeling)

Audrey Amrein-Beardsley writes on her blog VAMboozled that Néw York teacher Sheri Lederman rejected a settlement offer from the state.

Lederman, a veteran teacher on Long Island, is suing the state to challenge the validity of VAM. Although she has long been recognized as a superstar teacher, she got a low rating. Her husband Bruce is a lawyer, who is litigating on her behalf.

The state offered to raise her rating if she would abandon the lawsuit. The state said that the teacher evaluation process will be changed, in some fashion, but the Ledermans rejected the offer because there is no certainty that VAM will disappear.

Amrein-Beardsley explains the situation and adds useful links.

A few weeks ago, I posted a video of David Berliner’s speech in Australia, in which he explained why teachers and teachers’ education programs should not be evaluated by standardized test scores. This, as you know, is the policy that was the centerpiece of the failed Race to the Top. Its main effect has been to create teacher shortages; many experienced teachers have left the profession and enrollments in teacher education programs has sharply declined since the introduction of “value-added modeling” (VAM).

 

Audrey Amrein-Beardsley has done all of us a favor by transcribing Berliner’s speech. You can find it here.

 

 

Here are a few (not all) of his reasons:

 

 

 

“When using standardized achievement tests as the basis for inferences about the quality of teachers, and the institutions from which they came, it is easy to confuse the effects of sociological variables on standardized test scores” and the effects teachers have on those same scores. Sociological variables (e.g., chronic absenteeism) continue to distort others’ even best attempts to disentangle them from the very instructional variables of interest. This, what we also term as biasing variables, are important not to inappropriately dismiss, as purportedly statistically “controlled for.”
In law, we do not hold people accountable for the actions of others, for example, when a child kills another child and the parents are not charged as guilty. Hence, “[t]he logic of holding [teachers and] schools of education responsible for student achievement does not fit into our system of law or into the moral code subscribed to by most western nations.” Related, should medical school or doctors, for that matter, be held accountable for the health of their patients? One of the best parts of his talk, in fact, is about the medical field and the corollaries Berliner draws between doctors and medical schools, and teachers and colleges of education, respectively (around the 19-25 minute mark of his video presentation).
Professionals are often held harmless for their lower success rates with clients who have observable difficulties in meeting the demands and the expectations of the professionals who attend to them. In medicine again, for example, when working with impoverished patients, “[t]here is precedent for holding [doctors] harmless for their lowest success rates with clients who have observable difficulties in meeting the demands and expectations of the [doctors] who attend to them, but the dispensation we offer to physicians is not offered to teachers.”
There are other quite acceptable sources of data, besides tests, for judging the efficacy of teachers and teacher education programs. “People accept the fact that treatment and medicine may not result in the cure of a disease. Practicing good medicine is the goal, whether or not the patient gets better or lives. It is equally true that competent teaching can occur independent of student learning or of the achievement test scores that serve as proxies for said learning. A teacher can literally “save lives” and not move the metrics used to measure teacher effectiveness.
Reliance on standardized achievement test scores as the source of data about teacher quality will inevitably promote confusion between “successful” instruction and “good” instruction. “Successful” instruction gets test scores up. “Good” instruction leaves lasting impressions, fosters further interest by the students, makes them feel competent in the area, etc. Good instruction is hard to measure, but remains the goal of our finest teachers.
Related, teachers affect individual students greatly, but affect standardized achievement test scores very little. All can think of how their own teachers impacted their lives in ways that cannot be captured on a standardized achievement test. Standardized achievement test scores are much more related to home, neighborhood and cohort than they are to teachers’ instructional capabilities. In more contemporary terms, this is also due the fact that large-scale standardized tests have (still) never been validated to measure student growth over time, nor have they been validated to attribute that growth to teachers. “Teachers have huge effects, it’s just that the tests are not sensitive to them.”

 

 

Parents Across America has called on states to take advantage of the Every Student Succeeds Act and abolish VAM.

 

PAA has been critical of high-stakes testing.

 

PAA also produced a one-page fact sheet to demonstrate the failure of value-added-measurement.

In New Mexico, District Judge David K. Thomson issued a preliminary injunction against the use of the state’s teacher evaluation system, which tied consequences for teachers to student test scores. Unfortunately for the state, the research, the facts, and the evidence were not on their side.

 

Audrey Amrein-Beardsley was the expert witness against the New Mexico Public Education Department’s value-added teacher evaluation system, and she explains here what happened in court. Her account includes a link to the judge’s full ruling.

 

She writes:

 

Late yesterday [Tuesday], state District Judge David K. Thomson, who presided over the ongoing teacher-evaluation lawsuit in New Mexico, granted a preliminary injunction preventing consequences from being attached to the state’s teacher evaluation data. More specifically, Judge Thomson ruled that the state can proceed with “developing” and “improving” its teacher evaluation system, but the state is not to make any consequential decisions about New Mexico’s teachers using the data the state collects until the state (and/or others external to the state) can evidence to the court during another trial (set for now, for April) that the system is reliable, valid, fair, uniform, and the like.

 

As you all likely recall, the American Federation of Teachers (AFT), joined by the Albuquerque Teachers Federation (ATF), last year, filed a “Lawsuit in New Mexico Challenging [the] State’s Teacher Evaluation System.” Plaintiffs charged that the state’s teacher evaluation system, imposed on the state in 2012 by the state’s current Public Education Department (PED) Secretary Hanna Skandera (with value-added counting for 50% of teachers’ evaluation scores), is unfair, error-ridden, spurious, harming teachers, and depriving students of high-quality educators, among other claims (see the actual lawsuit here).

 

Thereafter, one scheduled day of testimonies turned into five, in Santa Fe, that ran from the end of September through the beginning of October (each of which I covered here, here, here, here, and here). I served as the expert witness for the plaintiff’s side, along with other witnesses including lawmakers (e.g., a state senator) and educators (e.g., teachers, superintendents) who made various (and very articulate) claims about the state’s teacher evaluation system on the stand. Thomas Kane served as the expert witness for the defendant’s side, along with other witnesses including lawmakers and educators who made counter claims about the state’s teacher evaluation system, some of which backfired unfortunately for the defense, primarily during cross-examination. [Kane, an economist] has been the chief research advisor to the Gates Foundation about teacher evaluation.]

 

Open the post to see her many links, her analysis of the decision, and the many local articles about it.

 

The state, not surprisingly, called the decision “frivolous” and “a legal PR stunt.” It claimed that it would continue doing what the judge said it was not allow to do. I think a judge’s order trumps the will of the New Mexico PED.

 

 

 

John Thompson, historian and teacher, says that the Gates Foundation is fighting a losing battle to justify value-added assessment. At its root, he says, is an assault on public education, facilitated by a worship of data and a belief in the value of teacher churn.

 

He writes:
One of the Gates Foundation’s star value-added scholars, Dan Goldhaber, has voiced “concerns about the use of VAM estimates at the high school level for the evaluation of individual teachers.” Two years ago, he asked and answered “yes” to the question of whether reformers would have placed less emphasis the value-added evaluations of individual teachers if research had focused on high schools rather than elementary schools.
I once saw Goldhaber’s statement as “a hopeful sign that research by non-educators may become more reality-based.”
As the use of estimates of test score growth in evaluations becomes even more discredited, Goldhaber is not alone in making statements such as, “The early evidence on states and localities using value added as a portion of more comprehensive evaluation systems suggests that it may not be differentiating teachers to the degree that was envisioned (Anderson, 2013).”
So, what is now happening in the aftermath of the latest warning against value-added evaluations? This time, the American Educational Research Association AERA Council “cautions against VAM being used to have a high-stakes, dispositive weight in evaluations.”
The logic used by the nation’s largest education research professional association is very similar to what I thought Goldhaber meant when he warned against using various tests and models that produce so many different estimates of the effectiveness of high school teachers. The point seems obvious. If VAMs are imposed on all types of schools and teachers with all types of tests and students, then they must work properly in that wide range of situations. It’s not good enough to say we should fire inner city high school teachers because some researchers believe that VAMs can measure the quality of teaching with random samples of low-poverty elementary students.
Goldhaber now notes, “AERA’s statement adds to the cacophony of voices urging either restraint or outright prohibition of VAMs for evaluating educators or institutions. Doubtless, these stakeholders are genuinely concerned about potential unintended consequences of adopting these performance measures.”
However, Goldhaber and other supporters of corporate reform still twist themselves into pretzels in arguing that we should remain on their value-added path. Ignoring the effects of sorting as one of the factors that make VAMs invalid and unreliable for evaluating individuals, Goldhaber counters the AERA by illogically citing a couple of studies that use random samples to defend the claim that they can be causally linked to a teacher’s performance.

 

In other words, Goldhaber grasps at any straws to claim that it might not have been a mistake to mandate the risky value-added experiment before studying its likely negative effects. His bottom line is that VAMs might not be worse than many other inaccurate education metrics. And, yes, many things in education, as in all other sectors of society, don’t work. But, even if VAMs were reliable and valid for evaluating individuals, most people who understand school systems would reject the inclusion of test scores in evaluations because of the predictable and destructive policies it would encourage.

 

 

Moreover, Goldhaber is attacking a straw man. The AERA and corporate reform opponents aren’t urging a multi-billion dollar investment to scale up failed policies! My classroom’s windows and ceiling leaked, even as I taught effectively. But, that doesn’t mean we should punch holes in roofs across the nation so that all schools have huge puddles of water on the floor!
For reasons that escape me if the goal was improving schools as opposed to defeating unions, Goldhaber also testified in the infamous Vergara case, which would wipe out all California laws protecting teachers’ rights. He chronicled the negative sides of seniority, but not the benefits of that legally-negotiated provision. One would have thought that a court would have sought evidence on both sides of the issue, and Goldhaber only explored one side.
Goldhaber estimated the harm that could be done through “a strict adherence” to the seniority provision of “Last In, First Out” (“LIFO”). I’m sure it occasionally happens, but I’ve never witnessed such a process where the union refused to engage in a give and take in regard to lay-offs. More importantly, it once would have been easy to adopt the old union proposal that LIFO rights not be extended to teachers who have earned an “Unsatisfactory” evaluation. An agreement on that issue could have propelled a collaborative effort to make teacher evaluations more rigorous (especially if they included peer review.)
Reformers like Goldhaber ignore the reasons why we must periodically mend, but not end seniority. His work did not address the enormous social and civil rights benefits of seniority. It is the teacher’s First Amendment. Without it, the jobs of leaders who resist nonstop teach-to-the test will be endangered. Systems will have a green light to fire veteran teachers merely to get rid of their higher salaries and benefits. Without LIFO, corporate reformers will mandate even more mass closures of urban schools. Test scores will remain the ammunition in a war to the death against teachers unions. The poorest children of color will continue to be the prime collateral damage.
Even though he did not do so before testifying in Vergara, I hoped that Goldhaber would subsequently update his methodology in order to study both sides – both the costs and the benefits to students – of seniority protections. He has not done so, even though his new research tackles some other issues. In fact, I would have once been cautiously optimistic when reading Are There Hidden Costs Associated with Lay-offs? Goldhaber, Katherine Strunk, David Knight, and Nate Brown focus on the stress created by layoffs. They conclude, “teachers laid off and hired back to teach in the next school year have significantly lower value added in their return year than they had in years unthreatened by layoffs.” They find that the stress of receiving a lay-off notice undermines instructional quality and contributes to the teacher “churn” that especially hurts children in the poorest schools.
In a rational world, such a finding would argue for the reform of the education budgeting process that distresses educators – not for punitive measures against teachers who were blameless in this matter. In an even more rational world, Goldhaber et. al’s research would be used as an argument for more funding so that systems don’t have to cut it so close, and to provide support to teachers and students in stressful high-challenge schools.
By the way, I once faced such a layoff. It wouldn’t make my list as one of the thousands of the most stressful events of my career. The transparency of the process mitigated the uncertainty, minimized the chance of losing my job, and eliminated the chance that I would lose my career in an unfair manner. If Goldhaber and Strunk are really curious about the causes of teacher churn, they should visit the inner city and take a look at the real world that their metrics are supposed to represent. But, that is unlikely. Corporate reform worships at the idol of teacher churn. It is the cornerstone of the test, sort, reward, and punish policies that VAMs are a part of.
Goldhaber still seems to be sticking with the party line: Teacher churn is bad, except when it is good. We must punish teachers by undermining their legal rights in order to address the failings of the entire society. We must fight the stress fostered by generational poverty by imposing more stress on teachers and students in poor schools.
Once I believed that Gates-funded quantitative researchers were merely ignorant of the realities in schools. Maybe they simply did not know how to connect the dots and see how the policies they were advocating would interact with other anti-teacher, anti-union campaigns. Maybe I was naïve in believing that. But, at a time when the Broad Foundation is trying to replace half of Los Angeles’s schools with charters, we must remember the real danger of mandates for VAMs and against seniority in a competition-driven reform era where test scores are a matter of life and death for individual schools, as well as the careers of individual educators.
Every single rushed policy defended by Goldhaber may be a mere mistake. But, whether he understands it or not, the real danger comes from combining those policies in a top-down assault on public education.

The American Educational Research Association issued a warning against the use of value added measures for high-stakes decisions regarding educators and teacher preparation programs. The cardinal rule of assessment is that tests should be used only for the purpose for which they were created. A measure of fourth grade reading measures the student, not the teacher, the principal, or the school.

 

 

AERA Issues Statement on the Use of Value-Added Models in Evaluation of Educators and Educator Preparation Programs
WASHINGTON, D.C., November 11—In a statement released today, the American Educational Research Association (AERA) advises those using or considering use of value-added models (VAM) about the scientific and technical limitations of these measures for evaluating educators and programs that prepare teachers. The statement, approved by AERA Council, cautions against the use of VAM for high-stakes decisions regarding educators.

 

In recent years, many states and districts have attempted to use VAM to determine the contributions of educators, or the programs in which they were trained, to student learning outcomes, as captured by standardized student tests. The AERA statement speaks to the formidable statistical and methodological issues involved in isolating either the effects of educators or teacher preparation programs from a complex set of factors that shape student performance.

 

“This statement draws on the leading testing, statistical, and methodological expertise in the field of education research and related sciences, and on the highest standards that guide education research and its applications in policy and practice,” said AERA Executive Director Felice J. Levine.

 

The statement addresses the challenges facing the validity of inferences from VAM, as well as specifies eight technical requirements that must be met for the use of VAM to be accurate, reliable, and valid. It cautions that these requirements cannot be met in most evaluative contexts.

 

The statement notes that, while VAM may be superior to some other models of measuring teacher impacts on student learning outcomes, “it does not mean that they are ready for use in educator or program evaluation. There are potentially serious negative consequences in the context of evaluation that can result from the use of VAM based on incomplete or flawed data, as well as from the misinterpretation or misuse of the VAM results.”

 

The statement also notes that there are promising alternatives to VAM currently in use in the United States that merit attention, including the use of teacher observation data and peer assistance and review models that provide formative and summative assessments of teaching and honor teachers’ due process rights.

 

The statement concludes: “The value of high-quality, research-based evidence cannot be over-emphasized. Ultimately, only rigorously supported inferences about the quality and effectiveness of teachers, educational leaders, and preparation programs can contribute to improved student learning.” Thus, the statement also calls for substantial investment in research on VAM and on alternative methods and models of educator and educator preparation program evaluation.

 

Related AERA Resource:
Special Issue of Educational Researcher (March 2015)—
Value Added Meets the Schools: The Effects of Using Test-Based Teacher Evaluation on the Work of Teachers and Leaders

Celia Oyler is a teacher educator at Teachers College, Columbia University.

In this post, she explains that the Chancellor of the Néw York Board of Regents, Merryl Tisch, does not understand how the teacher evaluation plan she snd 10 other Regents just approved works.

Tisch thinks she solved the problem of VAM mistakes by permitting teachers like Sheri Lederman to appeal ratings that are clearly wrong.

Oyler says that Sheri Lederman’s rating, egregiously wrong, was not an “aberration.” The whole system is flawed.

“What is extremely important for all New York State educators and families to understand is that the Chancellor of the Board of Regents does not understand a very basic aspect of a policy she has foisted upon us.”

Dr. Jim Arnold, superintendent of the Pelham City schools, explains why Georgia has a teaching shortage. The answer can be summed up in a few words: Governor Nathan Deal and ALEC, and one very long sentence:

Is it any wonder that many teachers have finally reached the point where they are fed up with scripted teaching requirements and phony evaluations that include junk science VAM and furlough days and increased testing that reduces valuable teaching time and no pay raises and constant curriculum changes and repeated attacks on their profession from people that have no teaching experience and the constant attempts to legislate excellence and cut teacher salaries and reduce teacher benefits and monkey with teacher retirement and SLO’s for non-tested subjects and state and federal policies that require more and more paperwork and less and less teaching and tighter and tighter budgets that mean doing more and more with less and less and longer school days and larger classes with higher and higher expectations and a political agenda that actively encourages blaming teachers for societal issues and the denigration of public education and market based solutions and legislators bought and paid for by ALEC and a continued reliance upon standardized test scores as an accurate depiction of student learning and achievement with no substantive research to support such a position and top-down management from people that wouldn’t know good teaching if it spit on their shoes and slapped them in the face? No wonder teachers are discouraged. No wonder teacher morale is at an all- time low. No wonder more and more teachers are retiring.

Please read the rest to find out what should be done about Governor Nathan Deal’s embrace of Alec’s agenda to get rid of public education.

One of the affidavits at the trial of the Lederman v. King case was filed by psychologist Brad Lindell.

His full affidavit is included in this post, which contains all the affidavits.

He sent the following note to me to explain his view of VAM in layman’s terms:

I am Dr. Brad Lindell, one of the affiants in the Sheri Lederman case who was present at the oral arguments on Wednesday. It was truly something to observe. You got the feeling that good was was going to come from the great work of Sheri and Bruce Lederman and from the experts’ opinions in so far as changing this broken VAM system. You got the sense that the judge was listening to the science about VAM and not just to the political rhetoric.

Just want to fill you in on something that was presented in my affidavit modified to give a clear and understandable example of the effects of poor reliability on a full-scale WISC intelligence test. If the same test-retest reliability from the teacher assigned yearly VAM scores (.40) was applied to the WISC full-scale to determine the 90% confidence interval, the range would be ridiculously large.

Examples

If a student scored a full-scale IQ of 100 (average) then the 90% confidence interval would be an 81 to 119. This indicates that there would be a wide range where the scores from repeated administrations of the WISC would be expected to fall for this student. One could not have confidence in the validity of a intelligence test with low reliability. Without adequate reliability, there can not be validity. This same holds true for VAM scores, whose reliabilities have been found to be notorious low.

The reliability of the WISC is generally in the .80 to .90 range. The 90% confidence intervals are generally in the +\- 6 range. So this same person with a 100 full-scale IQ would have a 90% confidence range of 94-106. Quite a smaller range.

This is why reliability is so important, which has repeatedly been shown to be low like .2 to .4 for year-to-year VAM scores. This is also why teachers year to year VAM score vary so considerably, like in the case of Sheri Lederman. Without reliability there cannot be adequate validity.

A number of teachers from the Bad Ass Teachers Association drove to Albany to witness the trial of Néw York’s teacher evaluation system. Here are excerpts from some of their reports.

BAT 1:

“In responding to the Lederman’s suit, the state representatives Ira Schwartz, Assistant Commissioner of Accountability and (?) Sherman, a “Quality Control” official did not provide affidavits from independent experts, rather they asserted that the Lederman’s misunderstood the meaning of “growth”, providing language from promotional brochures.

“But the response also conceded that the policy was derived in pursuit of federal Race to the Top funding (another case of “Thanks Obama?”)
The judge teased this out of the state’s lawyer by asking so the students can perform well and the teacher be deemed ineffective? The Lawyer eventually answered “yes!” but tried to remind the court that the formula is only used as part of the overall evaluation. This caused the judge to ask aloud why “discordant” and “inappropriate” results would be used in any percentage.”

BAT 2:

“In comparison to Lederman’s pointed and constructed argument, the assistant attorney general did a minimal response to argument, defending the definition and use of the growth model. The judge asked over and over how a teacher could go from a 14 one year to a 1 the following year. No answer was really given except for a poor attempt to explain the model’s comparison to other students.

“Quote of the day – from Lederman – went something like this: Are we living in a science fiction world where Hal the Computer gets to make decisions and there is no opportunity for human input or appeal?…

“I am in awe of the Lederman’s, true heroes for all of the downtrodden teachers being judged by a flawed and unfair system of measurement. If they win, all teachers and all students win.”

BAT 3:

“Another thing that stood out was just how deeply flawed the system is. When a teacher like Sheri goes from 14 points to 1, yet her students are doing very well and meeting proficiency, it’s easy to see that something is deeply wrong. I believe everyone in the courtroom saw that today. The argument that it’s just a portion of an overall score doesn’t matter to me. If any part of it is flawed, the whole thing should not be used. Let’s hope the judge agrees.

“Lastly what stood out was how much Bruce shined, and the state faltered. Prepared and eloquent, Bruce laid out the arguments point by point and handled challenging questions with thorough and thoughtful explanations. The same could not be said about the state’s representation and it was wonderfully obvious.”

BAT 4:

“It was exciting for me to witness this hearing, and I feel that the outcome of this case could be very historic in our fight to save public education.”

BAT 5:

“The state continued to argue the rating computer system was valid. NYSED admitted that you can have a teacher whose kids are successful yet the teacher will still get a low rating. Overall, they seemed okay with that reality. The state explained the evaluation system will let them get rid of ‘outliers’.

“After the proceedings I was moved to tears when The Lederman’s shared their motivation. Mrs. Lederman’s evaluation had her distraught, she was ready to quit and leave the profession that she loves. After many late night discussions, she and her husband decided, to challenge this unjust system. They certainly know they have a world of educators and parents on their side.”

BAT 6:

“* It was encouraging to have the public see for themselves that even the state could not explain the impact of test scores on teachers.

* The common sense scenarios as presented by the judge made it clear to everyone in the courtroom just how ridiculous VAM is.”

BAT 7:

“I am biased, for sure, but I have to say Bruce and Sheri Lederman have done an amazing job laying out the faults of this system. They have lined up the best in the field to validate their claims. My overall feeling was, that as gruff as this judge seemed to be, he got it. He understood that this system is not transparent, is not valid, and should not be used to judge teachers. We hope that his findings, which will be rendered in about 6 weeks, will state just that.”