Audrey Amrein Beardsley invited Stanford Professor Emeritus Edward Haertel to explain why a video called “The Oak Tree Analogy” is flawed.
Apparently there are districts that use this video to try to explain teacher evaluations based on growth or decline of test scores.
Whether you are talking about oak trees or corn or teachers, VAM is Junk Science.
And if you want to know more, read Haertel’s excellent review of the research on VAM here.
Here’s an interesting post from Cathy O’Neill, aka the Mathbabe: http://mathbabe.org/2014/03/07/an-attempt-to-foil-request-the-source-code-of-the-value-added-model/
Seems the Mathbabe, whose a college professor, was seeking the math that NYC used to calculate VAM and had to resort to a FOIL to get it… and was stymied by the NYDOE… Thought you and your readers might be interested
Dear Diane,
Again I URGE you and your readers to cease referring to VAM as “junk science” and to save your considerably effective ammunition for junk policies that abuse the science and the good scientists who are working to refine the statistical methods and assure their accuracy.
I have extracted a few relevant quotations from Professor Haertel’s paper that you cite as evidence, and note that he does not trash VAM; rather he explains it and urges appropriate caution.
Thanks.
David Cooper
Elon University
Direct quotations from Haertel’s paper:
like most statistical tools, these models are good for some purposes and not for others (p.4).
VAMs that reach back further in time, including test scores from 2 years earlier as well as from the previous year, are considerably more accurate (p.10).
There is certainly solid evidence that VAMs can detect real differences in teacher effectiveness, but as noted, some of the strongest evidence (e.g., Chetty et al., 2011; the MET Project, 2010, 2012) has come from studies in which student test scores were not high stakes for teachers (p.24).
Are teacher-level VAM scores good for anything, then? Yes, absolutely. But, for some purposes, they must be used with considerable caution. To take perhaps the
easiest case first, for researchers comparing large groups of teachers to investigate the effects of teacher training approaches or educational policies, or simply to investigate the size and importance of long-term teacher effects, it is clear that value-added scores are far superior to unadjusted end-of-year student test scores. Averaging value-added scores across many teachers will damp down the random noise in these estimates and could also help with some of the systematic biases, although that is not guaranteed. So, for research purposes, VAM estimates definitely have a place. This is also one of the safest applications of VAM scores because the policy researchers applying these models are likely to have the training and expertise to respect their limitations (p.24).
A considerably riskier use, but one I would cautiously endorse, would be providing individual teachers’ VAM estimates to the teachers themselves and to their principals, provided all of the following critically important conditions are met:
• Scores based on sound, appropriate student tests
• Comparisons limited to homogeneous teacher groups
• No fixed weight — flexibility to interpret VAM scores in context for each individual case
• Users well trained to interpret scores
• Clear and accurate information about uncertainty (e.g., margin of error) (p.25).
David, please note that Ed Haertel says that VAM scores should have “no fixed weight.” Most states do assign a fixed weight to VAM scores, which is why they are Junk Science. When a teacher of gifted children is rated ineffective because her students have hit the ceiling, that is Junk Science. When nearly half of Florida’s teachers of the year are rated ineffective, that’s Junk Science. When a teacher gets a bonus one year, and fired the next, that’s Junk Science. There is so much more to the art and craft of teaching than standardized tests reveal. If you doubt the validity of the measure, then using it to give people ratings and to fire them is wrong.
Dear Diane,
In the hands of a teacher who understands their value, (1) projections of students’ test scores (the prospective use of VAM) and (2) the careful analysis of actual scores relative to projections (the retrospective use of VAM) make substantial contributions to planning and evaluating instruction. To dismiss all VAM as junk is to deny its demonstrated benefit to real children.
Every abuse you have listed in your response to me today is the result of bad policy, not bad science.
The statisticians who produce the VAM scores for teachers are not the ones deciding how much weight to assign these indices; those decisions are typically made by school boards and state-department regulators. Junk policy, not junk science.
When tests are permitted to yield ceiling effects because of improper scaling, it was not the VAM analysts that allowed such flawed measurements; rather the administrators of the testing programs failed to comply with very well-established test scoring procedures. Junk policy, not junk science.
When teachers’ pay is set as a result of scores produced by VAM methods, don’t blame the VAM analysts; blame those who ignored the guidance provided by the literature on merit pay. Junk policy, not junk science.
The assault on science from the right is bad enough; we on the left need not collaborate in such recklessness.
Thanks.
David Cooper
Elon University
dianeravitch: this is presumptuous of me, but a cautionary piece of advice to you when Hades freezes over and public discussions with Michelle Rhee and David Coleman et al. finally occur, regarding how NOT to argue for one’s position.
Let’s make a wild foray into the miraculously improbable. If I assert that VAM is perfectly ok except for some totally (I mean, gag me with a spoon!) minor details like what actually happens in reality when the persons who buy and use it employ it for purposes for which it is demonstrably unsuited and misleading—I’ve, er, cut my whole line of arguments off at the knees.
Trust me, it is painfully hard for arguments to get up and around when they’ve got no knees.
Know what I mean?
Feel free to disregard. Or not.
Really!
Not Rheeally…
😎
The scientific community has a responsibility to ensure the research is properly used. I do not hear a loud enough voice questioning the misapplication of VAM. The is a certain hubris in statistics. While the math is sound, the assumptions and models are the reasons for the “junk science” label. Statisticians suffer from an overconfidence bias and need to leave the ivory towers and listen to the humans being measured.
While the math is sound, the assumptions and models are the reasons for the “junk science” label. Statisticians suffer from an overconfidence bias and need to leave the ivory towers and listen to the humans being measured.
I agree. The assumptions are not questioned often enough, nor are the inferential leaps about teacher “effectriveness.” The resources being sucked from schools and districts to feed the production of VAM scores is out of proportion to the validity of the tests ( source of data) and the statistical procedures.
VAM: The Scarlet Letter
For use on Twitter: Just copy, paste and then ReTweet every chance you get. The short link was created at Bitly ( https://bitly.com/shorten/ ) to make room for more content in the Tweet, and the link goes directly to Edward Haertel’s explanation of why VAM is junk science and fails as a teacher/public school final judgement tool.
VAM’s are called junk science but what they really are is a tool that falsely judges teachers who are then burned at the stake of public opinion (the same thing that happened during the”inquisition”—remember the horrors of that era in history that started in the 12th century) to profit billionaire oligarchs and further their own political/religious agendas to reinvent the United States into what they want it to be regardless of the Declaration of Independence, the US Constitution and the Bill of Rights.
Influence of teachers on standardized test scores—9%
Influence from out-of-school-factors—60%
Why punish teachers
$$$
http://bit.ly/1lIaNsa
xxxxxxxxxxxxxxxxxxx It’s awfully hard to put VAM into a commonsensical context, but I’ll try. It’s part of a business model, no? I worked in biz for many years (engrg-constr). The typical annual evaluation was couched in the context of career path/ career goals. Such goals were not mandated. if you saw your career path remaining at the technical level & increasing your technical knowledge, your goal was to become a tech consultant, & your annual achievement was measured thus. If your goal was advancement in the hierarchy, supervisory ability– the ability to nurture subordinates & become able to manage them in support of project goals was measured.
None of that relates to the teacher VAM. Teacher VAM seems related to cut-throat fields (advertising comes to mind), where the only thing that counts is your individual addition to the bottom line (% profit). regardless of economic trends, client whims, whatever.