Mitchell Robinson, who teaches music at Michigan State University, writes here about the madness of assessing teachers by “value-added” or growth measures, especially when they don’t teach the tested subjects.
State officials listen attentively to the unaccredited National Council on Teacher Quality, which was created by the conservative Thomas B.Fordham Foundation and kept alive by an emergency infusion of $5 million by then-Secretary of Education Rod Paige.
A state official explained why VAM was necessary:
Venessa Keesler, deputy superintendent of accountability services at MDE, said measuring student growth is a “challenging science,” but student growth percentiles represent at “powerful and good” way to tackle the topic. “When you don’t have a pre-and-post-test, this is a good way to understand how much a student has progressed,” she said. Under the new law, 25 percent of a teacher’s evaluation will be based on student growth through 2017-18. In 2018-19, the percentage will grow to 40 percent. State standardized tests, where possible, will be used to determine half that growth. In Michigan, state standardized tests – most of which focus on reading and math – touch a minority of teachers. One study estimated that 33 percent of teachers teach in grades and subjects covered by state standardized tests.
Robinson comments:
What Dr. Keesler doesn’t seem to understand is that the student growth percentiles she is referring to are nothing more than another name for Value Added Measures, or VAM–a statistical method for predicting students’ academic growth that has been completely and totally debunked, with statements from nearly every leading professional organization in education and statistics against their use in making high stakes decisions about teacher effectiveness (i.e., exactly what MDE is recommending they be used for in teachers’ evaluations). The science here is more than challenging–it’s deeply flawed, invalid and unreliable, and its usefulness in terms of determining teacher effectiveness is based largely on one, now suspect study conducted by a researcher who has been discredited for “masking evidence of bias” in his research agenda.
Dr. Keesler also glosses over the fact that these measures of student growth only apply to math and reading, subjects that account for less than a third of the classes being taught in the schools. If the idea of evaluating, for example, music and art teachers by using math and reading test scores doesn’t make any sense to you, there’s an (awful) explanation: “‘The idea is that all teachers weave elements of reading and writing into their curriculum. The approach fosters a sense of teamwork, shared goals and the feeling that “we’re all in this together,’ said Erich Harmsen, a member of GRPS’ human resources department who focuses on teacher evaluations.”
While I’m all for teamwork, this “explanation” is, to be polite, simply a load of hooey. If Mr. Harmsen truly believed in what I’ll call the “transitive property” of teaching and learning, then we would expect to see math and reading teachers be evaluated using the results of student learning in music and art. Because what’s good for the goose…right?
The truth is, as any teacher knows, for evaluation to be considered valid, the measures must be related to the actual content that is taught in the teacher’s class–you can’t just wave some magical “we’re all in this together” wand over the test scores that miraculously converts stuff taught in band class to wonderful, delicious math data. It just doesn’t work that way, and schools that persist in insisting that it does are now getting sued for their ignorance.
Why should teacher evaluation be standardized when there is so much messy human, social, and economic intervention in the scores that cannot be controlled or measured?
Robinson disputes the value of standardization:
Teachers work with children, and these children are not standardized.
Teachers work in schools, and these schools exist in communities that are not standardized.
And teachers work with other teachers, custodians, secretaries, administrators, school board members, and other adults–none of which are standardized.
So why should teacher evaluations systems in schools in communities as diverse as the Upper Peninsula and downtown Detroit evaluate their teachers using the same system? And why is the finding that “local assessments can vary among ‘teachers at the same grade, in the same school, teaching the same subjects'” a bad thing?
The thing that we should be valuing in these children, schools and communities is their diversity–the characteristics, talents and interests that make them gloriously different from one another. A school in Escanaba shouldn’t look like a school in Kalamazoo, and the curriculum in each school should be tailored to the community in which it resides. The only parties that benefit from “standardizing” education are the Michigan Department of Education and the testing companies that produce these tests, because standardizing makes their jobs easier. Standardizing teaching and learning doesn’t help students, teachers or schools, so why are we spending so much time and money in a futile attempt to make Pearson and ETS’s jobs easier?
Why did the powers that be implement such a absurdly unfair system? Because they could. And mostly because it is a primarily female profession. And because it is not in the nature of elementary and middle school teachers to go against the rules, no matter how oppressive. This is the trademark of the bully. They rarely pick on those who would fight back.
Would the American Dental Association, (primarily males), ever allow their rank and file dentists to be evaluated using the cholesterol levels of primary physicians? Why not? They are both health care professionals.
“Canned VAMs”
We’re all in this together
— In dumpster, from the VAM.
Though music’s what I mentor,
From math I got the can
“Six years after ranking fifth in the nation, Ohio’s education system has tumbled to 23rd among the 50 states and District of Columbia, according to a national report card released on Wednesday.
Ohio earned a C, scoring 74.9 of a possible 100 on the annual Quality Counts report by Education Week, an education trade newspaper. That’s down from last year, when Ohio ranked 18th with a score of 75.8.”
The response from our government leaders is to double down on ed reform, which is what they have been doing for the last 15 years.
The ed reform record in the Great Lakes states is absolutely terrible. The national stats really serve to obscure this, but our states seem to be the dumping ground for every fad and gimmick and experiment these folks come up with. Something has to change, and soon. The ed reform wrecking crew are doing what will be irreparable and long term damage to these existing public systems. Public schools have plenty of “challenges” but maybe we could hire some people who won’t make them worse and are actually committed to their success instead of hoping they can replace them?
http://www.dispatch.com/content/stories/local/2016/01/07/rankings-slide-in-national-report.html
We now have 50 million public school students who have experienced test-and-punish reform and nothing but test-and-punish reform. Testing and test-prep in two subjects have become the nearly sole focus of K to 12 education in America. Only harm. No good. Just ask any advocate of this irresponsible approach to schooling specifically what benefits have been reaped after 15 years.
You can respond to their blank stare and incoherent ramblings by letting them know that they own this monumental FAILURE.
“If Mr. Harmsen truly believed in what I’ll call the “transitive property” of teaching and learning, then we would expect to see math and reading teachers be evaluated using the results of student learning in music and art. Because what’s good for the goose…right?”
This is a great line. A little known fact is this: For about a decade, USDE’s arts education grants were evaluated within USDE by whether the programs produced higher test scores in math or reading. The agency never got high marks on that.
Even so, the “lets integrate the arts into the curriculum” enthusiasts have encouraged this “transitive” thinking as have all of the “technical assistance” gurus that USDE hired to push the use of “SLOs” and tests attached to those absurd exercises as if equivalent to VAM. USDE’s own research showed that measures for “untested subjects” –a bizarre category–have no validity. Unfortunately reason and evidence do not trump ideology and administrative convenience.
Chiara. Be aware that this section of EdWeek is funded by the Gates Foundation and includes all sorts of his preferred indicators of “quality.” The problem is that these ranking are too much like those published in US News and World report for teacher preparation programs. Some of indicators are potentially useful (if they are accurate) but watch out for anything reduced to a letter grade. Of course, the local press is likely to recycle the grades without looking at them critically, same as in Ohio’s ratings of teachers, schools, and dustricts.
On the matter of the Great Lakes states being inundated with fads and gimmicks, just wait until the options and reporting requirements for ESSA kick in.
USDE’s own research showed that measures for “untested subjects” –a bizarre category–have no validity. ”
Not sure why anyone would need “research’ to tell them that
IMHO, a person would really have to be dumber than a bag of rocks to believe they do..
…which does tell us one thing: there are a lot of people pushing this stuff who are indeed “dumber than a bag of rocks.”
this is Jay P Greene again making excuses. Of course he blames the lousy teachers, feckless parents and the “girt less” kids. If his model of voucher/choice doesn’t prove out then he says “parents don’t know how to make choices”… This is what Martin West did at Fordham Institute when his questionnaires did not prove out to be valid or reliable “well the kids just have self-bias” or some internal psychological wiring that makes them lie on the questionnaires. http://jaypgreene.com/2016/01/07/what-else-could-explain-negative-result-from-louisiana/
““parents don’t know how to make choices””
What an odd thing for a choice proponent to say. Isn’t that an argument against choice?
Dienne: good catch.
I think the original is more damning:
[start full paragraph]
More importantly, if the participating private schools are so bad – and other people apparently knew they were bad, given the declining enrollments – then why did the voucher recipients choose them? Did the parents fail to research their options? Do they not value academics much at all? Blaming the results on an unusually bad set of private schools is tempting, but it creates the new problem of having to explain why parents made such dubious choices.
[end full paragraph]
This concords perfectly with the argument of folks like myself that assert: 1), that when it comes to rheephorm claims, charters/privatization/vouchers are inherently & almost always demonstrably better than practically anything public schools can offer; 2), that when the vast majority are given a “choice” by the few that are leading and enforcing the self-styled “education reform” movement, parents and students and their associated communities will with rare exceptions choose rheephorm options; and 3), when rheephorm options are rejected, it is because those doing the rejection are the student/parent/community equivalent of the lazy LIFO union thug teachers that want to maintain an inequitable and racist status quo even when rheephorming that status quo would help them help themselves.
In other words, rheephormsters know better than us aka the vast majority. We need to take their bitter medicine because they will not shirk their parent-like responsibilities re noblesse oblige. And how do we know they know better than us?
Just ask them. Or better yet, as rheephromista geniuses like Koch and Gates might put it: “Have your staffer call my staffer and let them sort it out.”
Perhaps this aspirational rheephorm-like blast from the past might put the virtues of the worthy few making choices for the unworthy many in clearer perspective.
Bertolt Brecht, poem published in 1959, on the armed suppression of the 1953 East German uprising:
[start]
The Solution
After the uprising of the 17th of June
The Secretary of the Writer’s Union
Had leaflets distributed in the Stalinallee
Stating that the people
Had forfeited the confidence of the government
And could win it back only
By redoubled efforts. Would it not be easier
In that case for the government
To dissolve the people
And elect another?
[end]
Anyone up for putting the same folks in charge of Flint, MI water in charge of Detroit schools? [Look at the thread under the following—]
Link: https://dianeravitch.net/2015/12/21/michigan-officials-lied-about-lead-poisoning-in-water/
😎
I have a much more cynical explanation. I think the charter supporters joined with the voucher supporters in Ohio because they’re a political alliance as much as an “educational” alliance.
They needed to extend public funding to private religious schools or private religious schools wouldn’t stay on board for ed reform.
I’m familiar with some of these private schools that receive up to 70% of their funding from vouchers. I’m sure parents pick them for all kinds of good reasons, but the idea that they’re “better” than the public schools is nonsense. Some are good, some are bad, and a lot are in the middle.
This is a political alliance, charters + vouchers, and that’s all it is.
Having now read the article, I find this a much more stunning admission:
“Personally, I do not find it plausible that school quality alone could have so much impact, especially in one year. The traits that students bring with them to school – natural abilities, resilience, family support networks – generally explain much more of the variance in student achievement than school quality.”
Just gobsmacked here. Hardly know what to say. Who ever would have thought such a thing??
Dienne: re your 1-27-16, 1:27 PM comment—
That was the other part of the article that particularly struck me too.
Rheephormsters constantly harp on one in-school factor, i..e, teachers, and then severely discount all other in-school factors as well as the weightier [in their totality] out-of-school factors. But when it is to their advantage to deflect, avoid and nullify criticism of their own initiatives and mandates and projects, they suddenly find all sorts of in-school and out-of-school factors to prop up their version of the Teflon Defense.
That is why I assert—and I think it is not hyperbole—that the leading lights of self-styled “education reform” routinely engage in double think, double talk and double standards.
That’s how I see it.
Thank you for your follow-up observations.
😎
when I get enraged I don’t see that well… I missed this yesterday on the Jay P Greene article…. “(Guest Post by Jason Richwine)” he is the one who wrote that dissertation at Harvard, got hired by Heritage and then had to be let go because his philosophy about intelligence and minority groups is so offensive. Don’t know where he is working now but Greene gives him a platform and that other fellow makes comments supporting the charter/voucher system….
cx. of course I meant “grit” but they also measure “girth”
Also:
As Gary Rubinstein and I have shown, using plain old scatter plots that we teach to middle school students these days, even for English/reading or mathematics teachers, all the evidence shows that for any given teacher , including veterans, their score one year is totally unrelated to the score they got the previous or following year. Even if they are teaching the same exact courses, in the same exact schools, the same exact way. Theory would predict that their scores would be pretty constant, right? But they aren’t, not at all. Look at the scatter plots I and Gary made on our blogs and get some reasonably bright 8th grader to explain it
So if this extremely complicated theoretical model can’t get even get that right, then we can’t trust it to show anything useful at all. It gives absolutely no trustworthy evidence for good or bad teaching – which supposedly was its purpose.
Former colleagues have told me of perverse situations where a teacher felt, and had concrete evidence of, she had made the most progress with her 100+ students in 5 classes one year, only to find that the mindless and useless algorithm claimed that her ultimate outcome score would almost merit firing her!!
And there is also the bit that the authorities who devised these formulas specify that they REFUSE to explain to any individual teacher precisely how their score was computed, step by step. Like you have to with your taxes, or a rental agreement or mortgage, or anything you pay for. Utterly no transparency internally, never actually field tested, unreliable, invalid, and so on.
It’s a big, long con artist game.
Writing from cell phone so difficult to spell right or embed links. Google gfbrandenburg vam and you’ll get links.
You’re probably familiar with Reanalysis of the Effects of Teacher
Replacement Using Value-Added Modeling” by Stuart Yeh that looked at a broad spectrum of studies that found a similar result regarding the (un)reliability of VAM. Namely, that it was no better than flipping a coin.
“The intertemporal reliability of value-added teacher rankings was investigated by Aaronson et al. (2007), Ballou (2005), Koedel and Betts (2007), and McCaffrey et al. (2009). In each study, VAM was used to rank teacher performance from high to low. In each study, a majority of teachers who ranked in the lowest quartile or lowest quintile shifted out of that quartile (or quintile) the following year (see Tables 1 and 2). Furthermore, a majority of teachers who ranked in the highest quartile or quintile shifted out of that quartile (or quintile) the following year (see Tables 1 and 2).
….
“In the case of value-added rankings, it is inappropriate to infer that a
teacher should be hired or fired based on the rankings from any given year. Since this inference would be inappropriate, the results of valueadded teacher rankings are not valid for the purpose of high-stakes decisions regarding hiring and firing. In short, VAM lacks validity for the purpose of high-stakes decisions regarding individual teachers.
While some researchers suggest averaging two or more years of rankings, averaging may introduce significant bias– raising
the issue of validity once again (McCaffrey et al., 2009). Furthermore, it would not be uncommon for data to be missing in a way that would prevent averaging. For large numbers of teachers, it would be impractical. (Newton et al
2010).
Regardless, when two years of rankings are used for tenure decisions, intertemporal reliability remains low: In reading, data from North Carolina indicate that 68% of teachers ranked in the bottom quintile shift out of that quintile after tenure (indicated by a weighted average of all post-tenure observations), and 54% of teachers ranked in the top quintile shift out of that quintile post tenure (Goldhaber & Hansen, 2008). When three years of rankings are used, reliability is even worse: 74% of teachers ranked in the bottom quintile shift out of that quintile post-tenure, and 56% of teachers ranked in the top quintile shift out of that quintile post tenure (Goldhaber & Hansen, 2008). In math reliability is somewhat better, but over half of all teachers in the bottom and top quintiles shift out of those quintiles post tenure (Goldhaber & Hansen, 2008).
“These results were confirmed by a second value-added analysis, also using data from North Carolina, which found that more than half of all teachers who ranked in the bottom quintile shifted out of that quintile the following year, regardless of whether one, two, three, four or five years of data were used to predict future performance, regardless of the subject area (math or reading), and regardless of whether a simple or complex Bayes estimator was used to improve predictive accuracy”
//end quote
gfbrandenburg and SomeDAM Poet: thank y’all for keeping it real.
Not rheeal.
😎
This page has a short video of Yeh talking about his findings
Of course they can’t be transparent. Let me infuse the psychological aspect of test development. Social constructs are operationally defined and tested. These constructs are based on assumptions. To be transparent they would have to reveal their assumptions that went into the test development and then support these assumptions with research. They can’t do that. To hide their fraud the put on the mantle of intellect and claim we wouldn’t understand, or the claim a property right to remain secret. This is why they can not defend themselves in open court and are forced to admit to the limitations of testing.
Omg, Diane. This is INSANITY to the MAX!
Thanks for this info.
Substitute VAM for Spam:
Just substitute “witch” with “bad teacher”. Evaluating non-tested subject area teachers with math and ELA scores applies a similar form of logic.
Art imitating life, eh?
Or is it life imitating art?
Idiots imitating educators!
Before we know it the, “she looks like one”, will be the new threshold for identifying “bad teachers”.
“The Witch Cake”
The VAMmers make
A witch’s cake
To spot a witch
From data rich
The teacher’s blamed
In public shamed
And finally fired
As was desired
“VAM’s a Fraud”
VAM’s a fraud
A science fake
And not to laud
For students’ sake
SomeDAM Poet
USDE hired PR firms to push SLOs as an alternative to VAM, only they called this effort to make the SLOs and “distributed scores” credible by labeling the PR documents “technical assistance. This PR campaign was “neccesary” because the policies under RTT mad no sense without a hard sell.
Thanks, Dienne… I have been banned for comments there at the J.P. Greene because I never agree with anything they write.
Congratulations! Now I have something to strive for. 😉
I can’t even begin to describe the stress that value added places on a teacher. I retire next year, and I am so relieved. I would never allow my children to enter the profession, and I am completely honest with anyone who asks me about the profession. This profession as it stands today does not deserve bright, energetic young people investing their lifetimes into a dark hole which will take everything from them . . . . their time, their money, their sense of well being…A teacher struggles to live a normal life….I could go on and on……..This profession will get everything it deserves in due time…no one willing to give up their lifetimes to be abused. It is coming.
Your are absolutely right Sad Teacher, we are at the edge here in Las Vegas and there is not relief in sight for the board and superintendent. They are letting retired teachers double dip they are so shorthanded….and it is about to get worse. They are finding out there are not enough TFA ers to bail them out and no one with a brain and any other option will work for them. The state of Nevada will suffer next.