Search results for: "Value-Added"

When this statement first appeared in 2014, I said at the time that it should be on the bulletin board of every public school.

The American Statistical Association explains here why the evaluations of individual teachers should not be based on their students’ test scores.

Here is an excerpt. Read the whole statement, which is only 8 pages long:

It is unknown how full implementation of an accountability system incorporating test-based indicators, such as those derived from VAMs, will affect the actions and dispositions of teachers, principals and other educators. Perceptions of transparency, fairness and credibility will be crucial in determining the degree of success of the system as a whole in achieving its goals of improving the quality of teaching. Given the unpredictability of such complex interacting forces, it is difficult to anticipate how the education system as a whole will be affected and how the educator labor market will respond. We know from experience with other quality improvement undertakings that changes in evaluation strategy have unintended consequences. A decision to use VAMs for teacher evaluations might change the way the tests are viewed and lead to changes in the school environment. For example, more classroom time might be spent on test preparation and on specific content from the test at the exclusion of content that may lead to better long-term learning gains or motivation for students. Certain schools may be hard to staff if there is a perception that it is harder for teachers to achieve good VAM scores when working in them. Overreliance on VAM scores may foster a competitive environment, discouraging collaboration and efforts to improve the educational system as a whole.

Research on VAMs has been fairly consistent that aspects of educational effectiveness that are measurable and within teacher control represent a small part of the total variation in student test scores or growth; most estimates in the literature attribute between 1% and 14% of the total variability to teachers. This is not saying that teachers have little effect on students, but that variation among teachers accounts for a small part of the variation in scores. The majority of the variation in test scores is attributable to factors outside of the teacher’s control such as student and family background, poverty, curriculum, and unmeasured influences.

The VAM scores themselves have large standard errors, even when calculated using several years of data. These large standard errors make rankings unstable, even under the best scenarios for modeling. Combining VAMs across multiple years decreases the standard error of VAM scores. Multiple years of data, however, do not help problems caused when a model systematically undervalues teachers who work in specific contexts or with specific types of students, since that systematic undervaluation would be present in every year of data.

Despite the warning from ASA, which has no special interest and does not represent teachers or public school administrators, many states continue to use this method (called VAM, or value-added measurement or value-added modeling).

States were coerced into adopting this unproven method by the U.S. Department of Education, which said that states had to adopt it if they wanted to be eligible to compete for nearly $5 billion in federal funds in 2009, as every state was undergoing a budget crisis caused by the economic meltdown of fall 2008.

Many states adopted it, and it has not had positive effects in any state.

In Colorado and New York, among others, VAM scores count for as much as 50% of teachers’ evaluation.

A state court in New York ruled this method “arbitrary and capricious” when challenged by fourth grade teacher Sheri Lederman and her lawyer-husband Bruce Lederman.

Some states assign VAM scores to teachers based on students they never taught in subjects they don’t teach.

This is an example of federal and state policy that has no basis in evidence and that has harmed the lives of many teachers. It very likely has caused teachers to leave the profession and contributed to teacher shortages.

John Thompson, historian and teacher, says that the Gates Foundation is fighting a losing battle to justify value-added assessment. At its root, he says, is an assault on public education, facilitated by a worship of data and a belief in the value of teacher churn.


He writes:
One of the Gates Foundation’s star value-added scholars, Dan Goldhaber, has voiced “concerns about the use of VAM estimates at the high school level for the evaluation of individual teachers.” Two years ago, he asked and answered “yes” to the question of whether reformers would have placed less emphasis the value-added evaluations of individual teachers if research had focused on high schools rather than elementary schools.
I once saw Goldhaber’s statement as “a hopeful sign that research by non-educators may become more reality-based.”
As the use of estimates of test score growth in evaluations becomes even more discredited, Goldhaber is not alone in making statements such as, “The early evidence on states and localities using value added as a portion of more comprehensive evaluation systems suggests that it may not be differentiating teachers to the degree that was envisioned (Anderson, 2013).”
So, what is now happening in the aftermath of the latest warning against value-added evaluations? This time, the American Educational Research Association AERA Council “cautions against VAM being used to have a high-stakes, dispositive weight in evaluations.”
The logic used by the nation’s largest education research professional association is very similar to what I thought Goldhaber meant when he warned against using various tests and models that produce so many different estimates of the effectiveness of high school teachers. The point seems obvious. If VAMs are imposed on all types of schools and teachers with all types of tests and students, then they must work properly in that wide range of situations. It’s not good enough to say we should fire inner city high school teachers because some researchers believe that VAMs can measure the quality of teaching with random samples of low-poverty elementary students.
Goldhaber now notes, “AERA’s statement adds to the cacophony of voices urging either restraint or outright prohibition of VAMs for evaluating educators or institutions. Doubtless, these stakeholders are genuinely concerned about potential unintended consequences of adopting these performance measures.”
However, Goldhaber and other supporters of corporate reform still twist themselves into pretzels in arguing that we should remain on their value-added path. Ignoring the effects of sorting as one of the factors that make VAMs invalid and unreliable for evaluating individuals, Goldhaber counters the AERA by illogically citing a couple of studies that use random samples to defend the claim that they can be causally linked to a teacher’s performance.


In other words, Goldhaber grasps at any straws to claim that it might not have been a mistake to mandate the risky value-added experiment before studying its likely negative effects. His bottom line is that VAMs might not be worse than many other inaccurate education metrics. And, yes, many things in education, as in all other sectors of society, don’t work. But, even if VAMs were reliable and valid for evaluating individuals, most people who understand school systems would reject the inclusion of test scores in evaluations because of the predictable and destructive policies it would encourage.



Moreover, Goldhaber is attacking a straw man. The AERA and corporate reform opponents aren’t urging a multi-billion dollar investment to scale up failed policies! My classroom’s windows and ceiling leaked, even as I taught effectively. But, that doesn’t mean we should punch holes in roofs across the nation so that all schools have huge puddles of water on the floor!
For reasons that escape me if the goal was improving schools as opposed to defeating unions, Goldhaber also testified in the infamous Vergara case, which would wipe out all California laws protecting teachers’ rights. He chronicled the negative sides of seniority, but not the benefits of that legally-negotiated provision. One would have thought that a court would have sought evidence on both sides of the issue, and Goldhaber only explored one side.
Goldhaber estimated the harm that could be done through “a strict adherence” to the seniority provision of “Last In, First Out” (“LIFO”). I’m sure it occasionally happens, but I’ve never witnessed such a process where the union refused to engage in a give and take in regard to lay-offs. More importantly, it once would have been easy to adopt the old union proposal that LIFO rights not be extended to teachers who have earned an “Unsatisfactory” evaluation. An agreement on that issue could have propelled a collaborative effort to make teacher evaluations more rigorous (especially if they included peer review.)
Reformers like Goldhaber ignore the reasons why we must periodically mend, but not end seniority. His work did not address the enormous social and civil rights benefits of seniority. It is the teacher’s First Amendment. Without it, the jobs of leaders who resist nonstop teach-to-the test will be endangered. Systems will have a green light to fire veteran teachers merely to get rid of their higher salaries and benefits. Without LIFO, corporate reformers will mandate even more mass closures of urban schools. Test scores will remain the ammunition in a war to the death against teachers unions. The poorest children of color will continue to be the prime collateral damage.
Even though he did not do so before testifying in Vergara, I hoped that Goldhaber would subsequently update his methodology in order to study both sides – both the costs and the benefits to students – of seniority protections. He has not done so, even though his new research tackles some other issues. In fact, I would have once been cautiously optimistic when reading Are There Hidden Costs Associated with Lay-offs? Goldhaber, Katherine Strunk, David Knight, and Nate Brown focus on the stress created by layoffs. They conclude, “teachers laid off and hired back to teach in the next school year have significantly lower value added in their return year than they had in years unthreatened by layoffs.” They find that the stress of receiving a lay-off notice undermines instructional quality and contributes to the teacher “churn” that especially hurts children in the poorest schools.
In a rational world, such a finding would argue for the reform of the education budgeting process that distresses educators – not for punitive measures against teachers who were blameless in this matter. In an even more rational world, Goldhaber et. al’s research would be used as an argument for more funding so that systems don’t have to cut it so close, and to provide support to teachers and students in stressful high-challenge schools.
By the way, I once faced such a layoff. It wouldn’t make my list as one of the thousands of the most stressful events of my career. The transparency of the process mitigated the uncertainty, minimized the chance of losing my job, and eliminated the chance that I would lose my career in an unfair manner. If Goldhaber and Strunk are really curious about the causes of teacher churn, they should visit the inner city and take a look at the real world that their metrics are supposed to represent. But, that is unlikely. Corporate reform worships at the idol of teacher churn. It is the cornerstone of the test, sort, reward, and punish policies that VAMs are a part of.
Goldhaber still seems to be sticking with the party line: Teacher churn is bad, except when it is good. We must punish teachers by undermining their legal rights in order to address the failings of the entire society. We must fight the stress fostered by generational poverty by imposing more stress on teachers and students in poor schools.
Once I believed that Gates-funded quantitative researchers were merely ignorant of the realities in schools. Maybe they simply did not know how to connect the dots and see how the policies they were advocating would interact with other anti-teacher, anti-union campaigns. Maybe I was naïve in believing that. But, at a time when the Broad Foundation is trying to replace half of Los Angeles’s schools with charters, we must remember the real danger of mandates for VAMs and against seniority in a competition-driven reform era where test scores are a matter of life and death for individual schools, as well as the careers of individual educators.
Every single rushed policy defended by Goldhaber may be a mere mistake. But, whether he understands it or not, the real danger comes from combining those policies in a top-down assault on public education.

Hah! This is what we have been waiting for! Economists are now borrowing from the education research literature to develop value-added metrics for physicians. Next, I hope, will be the development of VAMs for lawyers and soon you will hear the screams of outrage not only from the American Medical Association but the American Bar Association. With the economists figuring out metrics to measure these politically powerful professions, teachers won’t be alone in their battle against obsessive compulsive metrical disorder. If only someone would come up with VAM for elected officials! Better yet, how about a VAM for economists? For example, how often do their predictions about the economy come true?


Here is how you measure the value-added of physicians according to the link above from the National Bureau of Economic Research:


“Despite increasing calls for value-based payments, existing methodologies for determining physicians’ “value added” to patient health outcomes have important limitations. We incorporate methods from the value added literature in education research into a health care setting to present the first value added estimates of health care providers in the literature. Like teacher value added measures that calculate student test score gains, we estimate physician value added based on changes in health status during the course of a hospitalization. We then tie our measures of physician value added to patient outcomes, including length of hospital stay, total charges, health status at discharge, and readmission. The estimated value added varied substantially across physicians and was highly stable for individual physicians. Patients of physicians in the 75th versus 25th percentile of value added had, on average, shorter length of stay (4.76 vs 5.08 days), lower total costs ($17,811 vs $19,822) and higher discharge health status (8% of a standard deviation). Our findings provide evidence to support a new method of determining physician value added in the context of inpatient care that could have wide applicability across health care setting and in estimating value added of other health care providers (nurses, staff, etc).”

Audrey Amrein-Beardsley has updated her reading lists on value-added assessment. Most of the studies cited show that it is inaccurate, unstable, and unreliable. The error rate is high. Students are not randomly assigned to teachers. Ratings fluctuate from year-to-year. About 70% of teachers do not teach tested courses. Perhaps that is why other nations do not judge teachers by the rise or fall of the test scores of their students. Unfortunately in this country, at this time, we have a cult worship of standardized testing, which is used to evaluate students, teachers, principals, and schools. People’s lives hang on the right answer. In a just world this practice would be recognized for what it is: Junk science.

Here are her top 15 studies. Open the link to find the top 25. Open the link to find links for all these readings. With Beardsley’s help, you too can be an expert.

American Statistical Association (2014). ASA statement on using value-added models for educational assessment. Alexandria, VA.

Amrein-Beardsley, A. (2008). Methodological concerns about the Education Value-Added Assessment System (EVAAS). Educational Researcher, 37(2), 65-75. doi: 10.3102/0013189X08316420.

Amrein-Beardsley, A., & Collins, C. (2012). The SAS Education Value-Added Assessment System (SAS® EVAAS®) in the Houston Independent School District (HISD): Intended and unintended consequences. Education Policy Analysis Archives, 20(12), 1-36.

Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., Ravitch, D., Rothstein, R., Shavelson, R. J., & Shepard, L. A. (2010). Problems with the use of student test scores to evaluate teachers. Washington, D.C.: Economic Policy Institute.

Baker, B. D., Oluwole, J. O., & Green, P. C. (2013). The legal consequences of mandating high stakes decisions based on low quality information: Teacher evaluation in the Race-to-the-Top era. Education Policy Analysis Archives, 21(5), 1-71.

Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., & Rothstein, J. (2012). Evaluating teacher evaluation. Phi Delta Kappan, 93(6), 8-15.

Fryer, R. G. (2013). Teacher incentives and student achievement: Evidence from New York City Public Schools. Journal of Labor Economics, 31(2), 373-407.

Haertel, E. H. (2013). Reliability and validity of inferences about teachers based on student test scores. Princeton, NJ: Education Testing Service.

Hill, H. C., Kapitula, L., & Umland, K. (2011). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48(3), 794-831. doi:10.3102/0002831210387916

Jackson, C. K. (2012). Teacher quality at the high-school level: The importance of accounting for tracks. Cambridge, MA: The National Bureau of Economic Research.

Newton, X., Darling-Hammond, L., Haertel, E., & Thomas, E. (2010). Value-added modeling of teacher effectiveness: An exploration of stability across models and contexts. Educational Policy Analysis Archives, 18(23), 1-27.

Papay, J. P. (2010). Different tests, different answers: The stability of teacher value-added estimates across outcome measures. American Educational Research Journal, 48(1), 163-193. doi:10.3102/0002831210362589

Paufler, N. A. & Amrein-Beardsley, A. (2014). The random assignment of students into elementary classrooms: Implications for value-added analyses and interpretations. American Educational Research Journal, 51(2), 328-362. doi: 10.3102/0002831213508299

Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, 4(4), 537-571. doi:

Schochet, P. Z. & Chiang, H. S. (2010). Error rates in measuring teacher and school performance based on student test score gains. Washington DC: U.S. Department of Education.

The studies of value-added measurement keep on coming, and the findings usually show what an utterly absurd idea it to think that teacher quality can be judged by student test scores. In a just world, Arne Duncan would be held accountable for the stupid and harmful theories he has imposed on the nation’s public schools. The U.S. Department of Education has become a malignant force in American education. I cannot think of any time in our nation’s history when public schools and teachers were literally endangered by the mandates coming from Washington, D.C., where the leadership is wholly ignorant of federalism.

This story in Education Week summarizes the latest batch of studies of VAM. some researchers, having made this their area of specialization, continue to prod in hopes of good news.

But look at this:

“In a study that appears in the current issue of the American Educational Research Journal, Noelle A. Paufler and Audrey Amrein-Beardsley, a doctoral candidate and an associate professor at Arizona State University, respectively, conclude that elementary school students are not randomly distributed into classrooms. That finding is significant because random distribution of students is a technical assumption that underlies some value-added models.

“Even when value-added models do account for nonrandom classroom assignment, they typically fail to consider behavior, personality, and other factors that profoundly influenced the classroom-assignment decisions of the 378 Arizona principals surveyed. That, too, can bias value-added results.

“Perhaps most provocative of all are the preliminary results of a study that uses value-added modeling to assess teacher effects on a trait they could not plausibly change, namely, their students’ heights. The results of that study, led by Marianne P. Bitler, an economics professor at the University of California, Irvine, have been presented at multiple academic conferences this year.
The authors found that teachers’ one-year “effects” on student height were nearly as large as their effects upon reading and math. The researchers did not find any correlation between the “value” that teachers “added” to height and the value they added to reading and math. In addition, unlike the reading and math results, which demonstrated some consistency from one year to the next, the height outcomes were not stable over time. The authors suggested that the different properties of the two models offered “some comfort.” Nevertheless, they advised caution.”

So, let’s get this right: teachers’ effects on students’ height were nearly as large as their effect on reading and math.

Perhaps Arne can just arrange to have all teachers fired (except for TFA), close every school (except “no-excuses” charter schools), and turnaround the whole country.

Paul Thomas follows Anthony Cody’s previously cited post by describing the unrelenting attack on teachers, which has intensified with the use of statistically inappropriate measures.

He writes:

“As Cody notes above, however, simultaneously political leaders, the media, and the public claim that teachers are the most valuable part of any student’s learning (a factually untrue claim), but that high-poverty and minority students can be taught by those without any degree or experience in education (Teach for America) and that career teachers no longer deserve their profession—no tenure, no professional wages, no autonomy, no voice in what or how they teach.

And while the media and political leaders maintain these contradictory narratives and support these contradictory policies, value-added methods (VAM) of evaluating and compensating U.S. public teachers are being adopted, again simultaneously, as the research base repeatedly reveals that VAM is yet another flawed use of high-stake accountability and testing.”

Thomas cites review after review to demonstrate that VAM is inaccurate and deeply flawed. Yet the evidence is ignored and VAM is being used as a political weapon by the odd bedfellows of the Obama administration and rightwing governors as well as some Democratic governors, like Andrew Cuomo of New York and Dannell Malloy of Connecticut, to attack teachers. President Obama made a point of praising the Chetty study in his 2012 State of the Union address, not waiting for the many reviews that showed the error of measuring teacher quality by test scores.

Thomas writes:

“The rhetoric about valuing teachers rings hollow more and more as teaching continues to be dismantled and teachers continue to be devalued by misguided commitments to VAM and other efforts to reduce teaching to a service industry.

“VAM as reform policy, like NCLB, is sham-science being used to serve a corporate need for cheap and interchangeable labor. VAM, ironically, proves that evidence does not matter in education policy.”

Race to the Top placed a $4.45 Billion bet that the way to improve schools was to tie teachers’ evaluations to their students’ test scores.

As it happens, the state of Tennessee has been using value-added assessment for 20 years, though the stakes have not been as high as they are now.

What can we learn from the Tennessee experience. According to Andy Spears of the Tennessee Education Report, well, gosh, sorry: nothing.

Spears has a list of lessons learned. Here are the key takeaways:

“4. Tennessee has actually lost ground in terms of student achievement relative to other states since the implementation of TVAAS.

Tennessee received a D on K-12 achievement when compared to other states based on NAEP achievement levels and gains, poverty gaps, graduation rates, and Advanced Placement test scores (Quality Counts 2011, p. 46). Educational progress made in other states on NAEP [from 1992 to 2011] lowered Tennessee’s rankings:

• from 36th/42 to 46th/52 in the nation in fourth-grade math[2]

• from 29th/42 to 42nd/52 in fourth-grade reading[3]

• from 35th/42 to 46th/52 in eighth-grade math

• from 25th/38 (1998) to 42nd/52 in eighth-grade reading.

5. TVAAS tells us almost nothing about teacher effectiveness.

While other states are making gains, Tennessee has remained stagnant or lost ground since 1992 — despite an increasingly heavy use of TVAAS data.

So, if TVAAS isn’t helping kids, it must be because Tennessee hasn’t been using it right, right? Wrong. While education policy makers in Tennessee continue to push the use of TVAAS for items such as teacher evaluation, teacher pay, and teacher license renewal, there is little evidence that value-added data effectively differentiates between the most and least effective teachers.

In fact, this analysis demonstrates that the difference between a value-added identified “great” teacher and a value-added identified “average” teacher is about $300 in earnings per year per student. So, not that much at all. Statistically speaking, we’d call that insignificant. That’s not to say that teachers don’t impact students. It IS to say that TVAAS data tells us very little about HOW teachers impact students.”

Read the whole article.

It is one of the best, most sensible things you will read on value-added assessment. It is a shame that Tennessee has wasted more than $300 million in search of the magic metric that identifies the “best” teachers. It is ridiculous that Congress and the U.S. Department of Education wasted nearly $5 billion to do the same thing, absent any evidence at all. Just think how many libraries they might have kept open, how many health clinics they could have started, how many early childhood programs initiated, how many class sizes reduced for needy kids.

But let’s not confuse the DOE with actual evidence when they have hunches to go on.

E.D. Hirsch, Jr., the founder of the Core Knowledge curriculum, wrote an article opposing value-added teacher evaluation, especially in reading. Hirsch supports the Common Core but thinks it may be jeopardized by the rush to test it and tie the scores to teacher evaluations. He knows this will encourage teaching to the test and other negative consequences.

Hirsch believes that if teachers teach strong subject matter, their students will do well on the reading tests. But he sees the downside of tying test scores to salary and jobs.

He writes:

“The first thing I’d want to do if I were younger would be to launch an effective court challenge to value-added teacher evaluations on the basis of test scores in reading comprehension. The value-added approach to teacher evaluation in reading is unsound both technically and in its curriculum-narrowing effects. The connection between job ratings and tests in ELA has been a disaster for education.”

He is right. Will the so-called reformers who recently became Hirschians listen?

Data hounds continue to search for a measuring stick to identify teacher quality.

They can’t believe they are on a fruitless hunt, like trying to find a barometer or yardstick to say which piece of art is best, which doctor is best, which…… as though human judgment means nothing.

Here is Matt Di Carlo summarizing the research on the instability of VAM, meaning that the best teacher this year might be only average next year, or vice versa.

A little known group called Educators for Shared Accountability designed a rubric for evaluating Secretaries of Education. It incorporates multiple measures.

By its metric, Richard Riley was our best national leader.

Check out Secretary Duncan’s value added rating.