Audrey Amrein-Beardsley reports here on new research by Steven Klees of the University of Maryland, which concludes that the contribution of individual teachers to student learning cannot be isolated or quantified as “value-added modeling” claims to do.
Accumulating evidence continues to demonstrate that the teacher evaluation systems imposed by Arne Duncan in the Race to the Top is invalid, inaccurate and unreliable. How many teachers and principals have been fired because of these flawed metrics?
Open the article for her many links.
She writes:
The Educational Researcher (ER) journal is the highly esteemed, flagship journal of the American Educational Research Association. It may sound familiar in that what I view to be many of the best research articles published about value-added models (VAMs) were published in ER (see my full reading list on this topic here), but as more specific to this post, the recent “AERA Statement on Use of Value-Added Models (VAM) for the Evaluation of Educators and Educator Preparation Programs” was also published in this journal (see also a prior post about this position statement here).
After this position statement was published, however, many critiqued AERA and the authors of this piece for going too easy on VAMs, as well as VAM proponents and users, and for not taking a firmer stance against VAMs given the current research. The lightest of the critiques, for example, as authored by Brookings Institution affiliate Michael Hansen and University of Washington Bothell’s Dan Goldhaber was highlighted here, after which Boston College’s Dr. Henry Braun responded also here. Some even believed this response to also be too, let’s say, collegial or symbiotic.
Just this month, however, ER released a critique of this same position statement, as authored by Steven Klees, a Professor at the University of Maryland. Klees wrote, essentially, that the AERA Statement “only alludes to the principal problem with [VAMs]…misspecification.” To isolate the contributions of teachers to student learning is not only “very difficult,” but “it is impossible—even if all the technical requirements in the [AERA] Statement [see here] are met.”
Rather, Klees wrote, “[f]or proper specification of any form of regression analysis…All confounding variables must be in the equation, all must be measured correctly, and the correct functional form must be used. As the 40-year literature on input-output functions that use student test scores as the dependent variable make clear, we never even come close to meeting these conditions…[Hence, simply] adding relevant variables to the model, changing how you measure them, or using alternative functional forms will always yield significant differences in the rank ordering of teachers’…contributions.”
Therefore, Klees argues “that with any VAM process that made its data available to competent researchers, those researchers would find that reasonable alternative specifications would yield major differences in rank ordering. Misclassification is not simply a ‘significant risk’— major misclassification is rampant and inherent in the use of VAM.”
Klees concludes: “The bottom line is that regardless of technical sophistication, the use of VAM is never [and, perhaps never will be] ‘accurate, reliable, and valid’ and will never yield ‘rigorously supported inferences” as expected and desired.
“VAMS Are NEVER Accurate, Reliable, or Valid”
And in other news headlines: “Yogi Bear Defecates in Jellystone Park”
Not meaning to take away from Amrein-Beardsley’s work as she has done excellent work for many years. Nor trying to take away from Diane’s posting as what is in the post certainly needs to be said and repeated over and over and over.
But when oh when will the complete invalidities involved with VAM, SLO/SGPs that render any results “vain and illusory” become common knowledge, enough so that we will reject these malpractices and get on with intelligent teacher (and student) evaluations using valid assessment techniques???
Duane,
Not in my life time.
“But when oh when will the complete invalidities …”
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
Is it JUST “standardized” testing, that produces a description of
an event, and NOT a description of the student?
Is TEACHER a description of a “student” that passed
NON-standardized tests?
How do the tests and scores, required for teacher certification,
remain “outside” the description of testing events, and become the
description (teacher) of a student graduate?
Duane,
Allow me to summarize what you (channeling Wilson) have said so many times
“You can’t make an output valid with invalid input no matter how great your analytical technique is.”
Succinct summary, SDP!
(Not sure why but your response didn’t show up in my notification box on wordpress. It’s not the first time and probably won’t be the last. Sometimes responses seem to get hung up in wordpress for a while. We’ll see if it eventually shows up.)
NoBrick,
That logical falsehood of assigning/attaching the grade/score/evaluation of an event on to either student or teacher is just one of the major onto-epistemological errors that permeate current educational discourse. It is one of those ingrained cultural myths on which many harms to the innocents, the students (and even the teachers if, no not if, when they are the recipient of those labels) are based. The more I have thought about it the more I realize that there are far too many myths and fantasies that “guide” (not the correct term) or influence and steer current educational policies.
We fancy our society as built on Enlightenment thought but superstitions, myths and phantoms abound in the fool’s paradise that is modern public education.
“…which concludes that the contribution of individual teachers to student learning cannot be isolated or quantified as “value-added modeling” claims to do.
Can we substitute “student test scores” for “student learning”? All VAM looks at is the former, which has nothing to do with the latter. We have no idea how to measure (sic) “learning”. Heck, we can’t even agree on what it is.
“We have no idea how to measure (sic) “learning”.”
That is because we can “measure” (sic) the unobservable as the proponents of these malpractices contend. Any true scientist will tell you that it is impossible to “measure the unobservable”. See our discussion yesterday in the “Jay Greene: Do Higher Test Scores Predict Better Life Outcomes”. SDP hit the nail on the head with these two comments:
“Physicists long ago had — and resolved — the debate about what is and is not measurable.
They rightly concluded that that which can not be observed can not be measured.
They actually called the measurable things “observables” to make the point perfectly clear.
That’s not to say that you can’t find things out about things that are not directly observable. You can, by observing the effects on other things. But you can’t perform measurements on non-observable things. That does not even make sense.
A “Quark” is an example of something that can never be observed directly — and hence not measured.”
and
“The most ridiculous thing about standardized testing is that it flips the usual idea of measurement on its head.
With normal science, one defines the thing to be measured — speed of a car, for example — and then uses some particular method to do the measurement. Critically, the answer one gets for the speed does not depend on the method used for measuring. Whether I use radar, sonar, or a stopwatch to time the car going a certain distance, I get the same answer.
In the case of standardized testing, the so-called “psychometricians” actually define what it is they are supposedly measuring in terms of the “measurement” instrument (test) itself and call that “learning”, “education”, “growth” or whatever. Because the “learning” is actually defined by the test, different tests can — and do — indicate different “learning”.
This sort of situation — where what is supposedly being measured not only depends on, but is actually DEFINED BY the “measurement” instrument — is pervasive in everything from IQ tests to SAT’s to VAMs.
It’s just weird.”
Hope you don’t mind, SDP, that I used your comments but they are well stated and explain how standardized testing proponents bastardize and misuse the word “measure”. And similar concerns about the bastardization and misuse of the word “standard” abound.
Ay, ay, ay, This is because we CAN’T measure not “can measure”
Dienne: to reinforce your point—
Over and over and over again the rheephormsters get away with saying “student learning” when what they really mean is “student test scores.” I agree, they are NOT the same thing. But confounding the one with the other in the public mind serves the vital rheephorm purpose of diverting attention, time and resources away from addressing real issues.
Is the confusion deliberate or accidental? As Steve Cohen points out below, the leaders/chief beneficiaries of corporate education reform ensure that THEIR OWN CHILDREN get a very different sort of education from that mandated by rheephormsters for OTHER PEOPLE’S CHILDREN.
Perhaps a better question would be: is the confusion conscious or unconscious? See this blog, 3-23-2014, “Common Core for Commoners: Not My School!” The entire posting follows:
[start]
This is an unintentionally hilarious story about Common Core in Tennessee. Dr. Candace McQueen has been dean of Lipscomb College’s school of education and also the state’s’s chief cheerleader for Common Core. However, she was named headmistress of private Lipscomb Academy, and guess what? She will not have the school adopt the Common Core! Go figure.
[end]
Link: https://dianeravitch.net/2014/03/23/common-core-for-commoners-not-my-school/
Thank you for your comments.
😎
Such a shame Duncan blew 4 billion dollars on this. They could have done a 4 billion dollar effort to increase attendance and gotten all but guaranteed results in actual learning, but that wasn’t sexy enough for the innovators and involved too few consultants.
But that 4 billion filled someone’s coffers!
Chiara,
How do we increase student attendance? Just interested in any ideas. Thanks!
Mamie,
The first step is to get rid of standardized tests (other than individual diagnostic test used to determine if the students need more help or an IEP) and all the test prep. That frees up not only class time but also monies to reinstate programs that were dropped so that the almighty test score might be obtained.
And 4 billion buys a lot of school supplies and lunches. To say Duncan was a fraud is an understatement. USDOE was just a springboard to a lucrative private sector position.
Some economists think they have discovered Asimov’s psychohistory. By the time all important confounding variables are included in the model, we end up with teacher observations.
Agreed, Duane. Get rid of the tests. And the prior post on this blog about having more community services would help. I find many students avoid school because they are working, have struggled since kindergarten, are bullied for being unique, have children, or have parents unable or unwilling to support them.
I don’t believe standardized testing and VAM were ever intended to improve anything. They were designed to feed the school failure narrative so they could discredit public education and make it vulnerable to takeover. This, I think, has always been the end game. For Gates, in particular, I believe “Personalized Learning” has always been the goal. That is how Gates cashes in on his “investment.” Broad is trying a “hostile takeover” tactic through the Third Way, and that is his next big move. “Reform,” at this point in time, is about making money for a few at the expense of many. It is about turning public assets into private equity, and it is about using public funds to build private wealth.
HEAR, HEAR.
Increasing attendance and learning outcomes doesn’t cause closing public schools to be replaced by for-profit, autocratic, corporate charters designed to make a few wealthy individuals wealthier and more powerful as they take over the indoctrination of OUR children.
It is now clearly obvious that NCLB, RTTT and the Common Core Crap (CCC) was never about increasing attendance and improving learning outcomes for all children. It was always, from its cloaked-in-secrecy and opaqueness start, about the privatization of public education for profit and power — a direct assault on OUR republic and its democratic public institutions.
Four billion dollars would buy a lot of air conditioners for sweltering city schools.
VAM has been “slammed” — quoting The Washington Post — by the very people who know the most about data measurement: The American Statistical Association (ASA). So every teacher who is unfavorably evaluated on the basis of students’ standardized test scores should vigorously oppose the evaluation, citing the ASA’s authoritative, detailed, seven-page VAM-slam “Statement on Using Value-Added Models for Educational Assessment” as the basis to have public employment boards and courts toss out any test-based Value Added Model (VAM) unfavorable evaluation.
Moreover, a copy of the VAM-slam ASA Statement should be posted on the union bulletin board at every school site throughout our nation and should be explained to every teacher by their union at individual site faculty meetings so that teachers are aware of what it says about how invalid it is to use standardized test results to evaluate teachers.
Even the anti-public school, anti-union Washington Post newspaper said this about the ASA Statement: “You can be certain that members of the American Statistical Association, the largest organization in the United States representing statisticians and related professionals, know a thing or two about data and measurement. The ASA just slammed the high-stakes ‘value-added method’ (VAM) of evaluating teachers that has been increasingly embraced in states as part of school-reform efforts. VAM purports to be able to take student standardized test scores and measure the ‘value’ a teacher adds to student learning through complicated formulas that can supposedly factor out all of the other influences and emerge with a valid assessment of how effective a particular teacher has been. THESE FORMULAS CAN’T ACTUALLY DO THIS (emphasis added) with sufficient reliability and validity, but school reformers have pushed this approach and now most states use VAM as part of teacher evaluations.”
The ASA Statement points out the following and many other failings of testing-based VAM:
> “VAMs typically measure correlation, not causation: Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model.”
> “Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions.”
“System-level conditions” include everything from overcrowded and underfunded classrooms to district-and site-level management of the schools and to student poverty.
Fight back! Never, never, never give up!
Reblogged this on Crazy Normal – the Classroom Exposé.
Since using VAM and test scores has been the primary foundation to bash public education, public school teachers and teachers’ unions, that invalidates the entire autocratic corporate public education reform movement and reveals it to be the profit loving sham and fraud that it is.
It’s a case of the fallacy of misplaced concreteness on steroids. A “score” is an abstraction, and a VAM score is a abstraction of an abstraction, taken to be the same thing as what a teacher actually does. It is a primitive error, like its first cousin, the “free market.” (In fact, a deep wound in our culture is the belief that only mathematics is real, which means that only economics as a branch of mathematics is real economics, and, that, in turn, what we experience is but a chimera of Reality.)
And, once again, the evidence for my proposition is that DFERites send their kids to private schools, which would never make such a rudimentary mistake in actually running a school. Reformers reveal their Real selves in this simple fact. And, by the way, they’re happy to abuse the beauty of mathematics, if need be, to achieve their hypocritical ends.
One minor correction
Economics is actually a branch of mathemagics, which has only a superficial relationship to mathematics.
“The VAMmirage”
The VAM is a mirage
A vision and a lie
Illusory collage
Of landscape and of sky
I wonder why this “letter” from Dr. Steven Klees to Educational Researcher took so long to see the light of day. Why was there was so much foot-dragging in getting any statement approximating expert judgment on the frauds perpetuated by VAMs?
I think the answer is clear enough. Important and numerous members of AERA have built careers around testing and VAMS.
The long delay is exposing the fraud of VAM has caused teachers to be fired, put students and teachers into unnecessary stress from testing, set into motion all sorts of mind-numbing professional development.
The “AERA Statement on Use of Value-Added Models (VAM) for the Evaluation of Educators and Educator Preparation Programs” came too late to save a bunch of people from irreversible harm and as far as I know there has been no public, clear expression of regret from AERA or acknowledgement from other groups that VAMs are inaccurate, unreliable, and invalid. Thank you for clarity Dr. Klees.
At minimum, AERA and the Institute of Education Sciences, and USDE should red flag, publish, and condemn all federal and state policies that continue the use of this methodology to rate teachers, principals, schools. That includes all references to within-year and year-to-year “growth scores,” SLOs, SGOs and other estimates of “impacts” or “value-added.”
Unfortunately an “easy to use” alternative is availble and taxpayers have funded it. Here you go.
“The Institute of Education Sciences (IES) has launched a new tool that can make it easier and more cost-effective for states and school districts to evaluate the impact of their programs.” (Impact is not much different from value-added).
“RCT-YESTM is free, user-friendly software that assists those with a basic understanding of statistics and research design in analyzing data and reporting results from randomized controlled trials (RCTs) and other types of evaluation designs. (RCT refers to a randomized control trial, marketed as the “gold standard” for cause-effect reasoning, but note that this new methodology can also be used with other evaluation schemes).
RCT-YESTM was developed by Mathematica Policy Research, Inc. under a contract from IES’ National Center for Education Evaluation and Regional Assistance. While the software has a simple interface and requires no knowledge of programming, it does not sacrifice rigor. RCT-YESTM uses state-of-the-art statistical methods to analyze data.
The software can be downloaded from the RCT-YESTM website, along with a quick start guide, interactive how-to videos, and a detailed user’s manual. A report properly generated using RCT-YESTM will include all the statistical information necessary for a review by the What Works ClearinghouseTM (WWC). The WWC, another IES investment, reviews high-quality research on interventions and programs to determine what works in education.” (Actually, the IES will review almost any empirical research, of even minimal quality, from almost any source if the research gets a lot of national press. Of course, the deepest pockets churning out questionalbe research, also have plenty of money for professional publicity).
https://ies.ed.gov/ncee/pubs/20154011/
“I think the answer is clear enough. Important and numerous members of AERA have built careers around testing and VAMS.”
The AERA along with the APA and the NCME are the producers/generators of the testing bible “Standards for Educational and Psychological Testing”.
If you ever want a laugh, read that book (I know it’s almost impossible to read). It’s full of utter nonsense about standards and standardized testing but purports to be the resource that one should follow when making a standardized test.
It declares that it is “scientifically based” way of making tests and making sure they are “objective”. I’m not sure what the authors, wait a minute there are no listed authors of the book, so let’s say the three sponsoring organizations think scientific means but it certainly doesn’t mean attempting to “measure the unobservable”, you know what they call “latent traits”.
That book is a mind boggling cluster. . .k!
Cross posted the original article at http://www.opednews.com/Quicklink/VAMs-Are-Never-Accurate–in-Best_Web_OpEds-Grading-Teachers_Learning_Mass-Teacher-Firings_Standardized-Tests-160617-983.html#comment602511
with this comment:
Despite, all the evidence on the failure of Opt-out, VAM and PARCC and despite the truth about the testing fiasco t hat teachers, parents and academics present at the Diane Ravitch* site, AND DESPITE the TRUTH/ “EVIDENCE ” from s cholarly groups like the American Educational Research Association and the American Statistical Association
Click to access asa_vam_statement.pdf
who have warned against using test scores to rate individual teachers, this travesty continues. The reason that legislatures and school boards are ‘Impervious to evidence’ is the they DO NOT GRASP THEY ARE BEING BAMBOOZLED, because most folks do not know what it takes to teach and what real LEARNING LOOKS LIKE..
http://www.opednews.com/articles/BAMBOOZLE-THEM-where-tea-by-Susan-Lee-Schwartz-110524-511.html
I live in Massachusetts which uses something called District Determined Measures (DDM) and am a parent. One thing that I’ve observed with the intense pressure for all subject teachers to teach “standards” is that it really is a form of censorship where educators are essentially punished by our district for going off script.This year my daughter’s US history teacher (11th grade) rebels and doesn’t really cover the bland and poorly written textbook and even more banal list of facts and DBQs and instead goes off script…last week for example my daughter told me that the class learned about the Orlando massacre in the context of the gay rights movement by learning about Stonewall etc. by reading original source documents and also learned the history of the transgender movement and what’s happening now and put it in historical context. This is the sort of class I really want my child to be in; not some class where they memorize a list of historical facts or parrot back some quote from text to cite as evidence on the author’s writing style. The students may not do that well on the district final or DDM but I think they learned a lot. I see the same type of thing happening in math class where the teachers lack the flexibility to write their own exams and set the pacing so that they make sure the students master material. If they find many students are weak in a pre-requisite skill they don’t have the flexibility to spend time on that without a fight. For example, in 8th grade my daughter’s science teacher realized none of the students knew how to measure anything using a ruler or a beaker or the metric system and had to fight to spend time to teach that because it’s necessary for all the labs and to understand the textbook. There’s this constant assessment but not a good way to actually return to material that students seem weak in or go over material that students need more time on and they end up with a lot of gaps from rushing from one standard to the next. I also think we really need to trust our teachers as the experts and not try to constantly micromanage exactly what they teach or how they assess.
Thank you, Sarah 5565.
The painful thing is that nearly all broadly intelligent people saw that using exam scores to fire teachers and close schools was simply crazy.
This society today has a lot of narrowly smart yet broadly unintelligent people.
A lot of “smart” people have no common sense.
Many of them also believe that the most high tech solution to problems is always the best one.
But aside from that, I suspect that many of the VAM cheerleaders don’t actually even know or care whether it works, as long as it helps them to achieve their goals.
Well, once something gets rolling or looks like it can fly, you get a lot of thoughtless or malicious self-serving cheerleaders.
There’s something really alarming here. A disempowered or tightly constrained majority, or else a rampant lack of common sense, or something like common sense but a bit deeper.
I feel we’re on the cusp of a major change in jobs, an unexpected shift that nobody is correctly predicting. And everyone calling the shots in education, industry, science and the social sciences is very, very busy chasing his or her tail, as it were, the opposite-of-anthropomorphically speaking, caninomorphically speaking, perhaps.