Cathy O’Neil: The Fallacy of VAM as “One of Multiple Measures”

Cathy O’Neil has written s new book called “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy.” I haven’t read it yet, but I will.

In this article, she explains that VAM is a failure and a fraud. The VAM fanatics in the federal Department of Education and state officials could not admit they were wrong, could not admit that Bill Gates had suckered the nation’s education leaders into buying his goofy data-based evaluation mania, and could not abandon the stupidity they inflicted on the nation’s teachers and schools. So they say now that VAM will be one of many measures. But why include an invalid measure at all?

As she is out on book tour, people ask questions and the most common is that VAM is only one of multiple measures.

She writes:

“Here’s an example of an argument I’ve seen consistently when it comes to the defense of the teacher value-added model (VAM) scores, and sometimes the recidivism risk scores as well. Namely, that the teacher’s VAM scores were “one of many considerations” taken to establish an overall teacher’s score. The use of something that is unfair is less unfair, in other words, if you also use other things which balance it out and are fair.

“If you don’t know what a VAM is, or what my critique about it is, take a look at this post, or read my book. The very short version is that it’s little better than a random number generator.

“The obvious irony of the “one of many” argument is, besides the mathematical one I will make below, that the VAM was supposed to actually have a real effect on teachers assessments, and that effect was meant to be valuable and objective. So any argument about it which basically implies that it’s okay to use it because it has very little power seems odd and self-defeating.

“Sometimes it’s true that a single inconsistent or badly conceived ingredient in an overall score is diluted by the other stronger and fairer assessment constituents. But I’d argue that this is not the case for how teachers’ VAM scores work in their overall teacher evaluations.

“Here’s what I learned by researching and talking to people who build teacher scores. That most of the other things they use – primarily scores derived from categorical evaluations by principals, teachers, and outsider observers – have very little variance. Almost all teachers are considered “acceptable” or “excellent” by those measurements, so they all turn into the same number or numbers when scored. That’s not a lot to work with, if the bottom 60% of teachers have essentially the same score, and you’re trying to locate the worst 2% of teachers.

“The VAM was brought in precisely to introduce variance to the overall mix. You introduce numeric VAM scores so that there’s more “spread” between teachers, so you can rank them and you’ll be sure to get teachers at the bottom.

“But if those VAM scores are actually meaningless, or at least extremely noisy, then what you have is “spread” without accuracy. And it doesn’t help to mix in the other scores.”

This is a book I want to read. Bill Gates should read it too. Send it to him and John King too. Would they read it? Not likely.

Lloyd Lofthouse says:

October 8, 2016 at 11:32 am

Bill King and all of the other Bill Gates paid-for minions will not read this book. It’s obvious that it is not in their best interest, the size of their bank account and future guaranteed wealth, to read a book that will put a stop to the Bill Gates gravy train.

The only way to change the federal Department of Education and every states Department of Education is to appoint people who are not bought and paid for minions of Bill Gates and his cabal of billionaires.

And if that day comes when the Bill Gates wrecking ball is stopped, then we still have the Koch brothers, their ALEC machine, and the creationist, Walmart Walton family to deal with.

ciedie aech says:

October 8, 2016 at 1:31 pm

a long list of “deplorables” 🙂

redqueeninla says:

October 8, 2016 at 11:42 am

? The problem with VAM is not that it’s little better than a random number generator, it’s that in generating falsehoods, wrong outcomes are concluded from this wrong data. And the ramifications of these wrong conclusions are really bad things like: kids being educated poorly and/or inequitably, inhumanely; teachers being blamed for things they did not do or had no bearing on, fellow citizens being subjected to false accusations, vile working conditions and improper punishment. Etc.

False conclusions, enacted, is more directed than the nonsense spouted from a random number generator. It’s active evil.

KrazyTA says:

October 8, 2016 at 5:15 pm

redqueeninla: even in an excellent thread under a most excellent posting—

Your comments shine.

Heartfelt thanks.

😎

- redqueeninla says:
  
  October 8, 2016 at 8:26 pm
  
  Gee, KrazyTA: Thanks!
  
  Truthfully, this latter-day revelation/realization from statisticians regarding the bizarre and insupportable bs going on in dusty educorners is a teensy bit crocile-teary.
  
  It’s incumbent on senior and theoretical members of a profession to keep a sharp eye on their applied-practioning brethren.
  
  That this stuff is poppycock is clear to any and all statis/mathematicians and has been for a long while. That there has been no concerted effort to reign in the excess professionally is politically unconscionable. Because bad things happen from theory, abused.
  
  Yet this is hardly a new story. The entire history of science is punctuated by socially-pernicious misunderstandings and applications of theory.
  
  On bad days I like to reread John Ewing’s 2011 simultaneous scolding of his politically dissociated colleagues even while setting straight the record on VAM: http://www.ams.org/notices/201105/rtx110500667p.pdf . I’m told, sheepishly, that “real” statisticians have been just sort of looking over their shoulders in embarrassment at all this edu-excess for quite some time. But “All that is necessary for the triumph of evil is that good men do nothing.” (Edmund Burke)
  
  I hope all this hack-analysis of Big Data will soon be a bad dream. But more professionals need to speak up and more laymen need to be more skeptical of the charlatans to whom they fork over treasure and authority.
- bethree5 says:
  
  October 8, 2016 at 11:32 pm
  
  Redqueeninla, great link, it’s going into my favorites. Sadly, I suspect that as long as folks believe algorithms can be derived from, e.g., stock market data banks which predict best stock performance, deluded folk mistrusting their own common sense will be in thrall to policies that replace human decisions with metrical predictions. Metrics & their robotic assistants, like Ipads in the classrooms, are just tools to help teachers/ legislators/ govrs etc make decisions based on a larger picture w/ many more inputs.
  
  Nevertheless, I appreciate O’Neil’s appeal to math nerds pointing out that 0 metric input combined w/ discredited VAM input = discredited VAM input.
Máté Wierdl says:

October 9, 2016 at 9:52 am

John Ewing’s paper is indeed excellent. Here’s a quote from the end, when the author advises mathematicians what to do about VAM.

Why must we use value-added even with its im-
perfections? Aside from making the unsupported
claim (in the very last sentence) that “it predicts
more about what students will learn…than any
other source of information”, the only apparent
reason for its superiority is that value-added is
based on data. Here is mathematical intimidation
in its purest form—in this case, in the hands of
economists, sociologists, and education policy
experts.

In other words, the whole thing boils down to accepting data to describe education or not.

The answer is clear, and the irrelevance of data is clear in many other situations for which VAM and similar mixed models were originally developed. For example, to describe the effects of fertilizers on, say, tomatoes, the effects are measured by evaluating data, while, as an enduser, I care about only one thing about tomatoes, and this one thing has nothing to do with data: the taste of it.

So it may make sense to start a general anti VAM movement with the aim of not just improving teacher assessments but improving the taste of tomatoes, apples, plums, oranges.

- redqueeninla says:
  
  October 10, 2016 at 1:52 am
  
  Wait, I’m shaking my head as in an actual fog, or rather coming out of one. Did you say: “… the irrelevance of data is clear…”
  
  That is, you did say this. And while I’ve been shrieking about all this edu-data-use since I first became aware of it about three years ago, I’ve actually never had the temerity to come right out and say: “… the irrelevance of data …”.
  
  Are others saying this; am I missing something?
  
  I hear plenty of physicists and mathematicians saying to shut the computer and chuck the calculator out the window.
  
  But I’ve actually missed any call for scrutinizing situations in which data are and are not relevant. But I don’t keep up with my reading: is this a thing?
  
  And… how to respond to the knee-jerk 21st-century-tools-users who would counter: to evaluate tomato-taste, is also data-driven. On a scale of {1-5 if you deign to acknowledge bias} or {1-4 if you do not}, how tasty would you judge that tomato to be…
  
  That is, are you claiming that some parameters, presumably qualitative/emotional ones, are inherently not quantifiable? Because that’s actually kind of profound in today’s modern climate. And is it true? I would get cold feet about asserting data “irrelevant”. And yet, I am perhaps timid…
  
  I guess what I’m getting at is: does it follow that “data are irrelevant” just because one statistical modeling application is. If the model is irrelevant, must its constituent data be too? Maybe… but this is kind of shaking up my foundations a bit.
- Máté Wierdl says:
  
  October 10, 2016 at 3:53 pm
  
  “On a scale of {1-5 if you deign to acknowledge bias} or {1-4 if you do not}, how tasty would you judge that tomato to be…”
  
  “Unfortunately”, people like different tomatoes, different cheeses, different wines. Good luck with quantifying any of this, that is, draw a conclusion from data that would be acceptable replacement for people’s taste buds.
  
  Quantify goodness in any of these cases:
  good mother, good husband, good writer, good actor, good student, good teacher, good painter, good president, good hairdresser, good doctor, good economist, good singer.
  
  So now, when a scientific system comes along and claims that it can decide goodness for us, we can be sure, it’ll fail. Either it will turn out to be too simplistic, and will make a completely bias hence meaningless assumption like
  
  “an economist is good if her theory supports free market capitalism”,
  
  or it tries to take into consideration the enormous variability in people’s understanding of goodness, and will end up behaving like VAM, which, one way or another, will appear as reliable as a roulette.
- redqueeninla says:
  
  October 10, 2016 at 4:50 pm
  
  Hmmm… I thought the idea – which I’m not arguing for, just cataloguing, is to substitute a secondary, derivative metric if you will. And empower it with quantity. So… when I rank a tomato, say, 3 out of 5, I’m not saying I believe it to be of neutral “goodness”, but I rank it as “middling” and we’re going to score how many respondents rank it similarly. So…it’s the yelp-phenomenon; we’re not going to claim the tomato is middling, but rather that the preponderance of folks, say, agree that it is. So that’s substituting a value judgment on tomato-goodness with, um, popularity of some particular judgment.
  
  I thought some vague fancy mathematical proof (ahem, mathematical intimidation what-ho?) justified this derivative ranking-metric….
  
  So you’re saying either: (i) No, qualitative assessment is just too complex to be properly quantified (and as well as O’Neill argues, these judgments have social-bias baked in from the get-go) or (ii) No, these models are just empirically labile and therefore crap.
  
  So I’m pretty sure this is the academic social-science justification for things like Likert scores, this “rank-on-a-scale-of-1-to-pretty…” exercise.
  
  My response would be that you can’t sanitize a judgment that’s impossible to make by rebranding it. If the judgment were erroneous, making the same wrong judgment repeatedly does not transform it to correctness. It just becomes a popularity metric of not-true judgments.
  
  But folks have argued this is acceptable basically because big numbers smooth away variance.
  
  Your point, I think, is that error/bias being present from that beginning formulation-judgment, no subsequent data-manipulation will while away that initial error.
  
  So… I agree with that except, what I think is the problem is that all this inaccurate measuring and interpreting is improperly empowered. So for something like yelp, folks are clear that judgment is not an absolute metric; the denominator of, say, each restaurant’s judgers is different, including one’s own, so those rankings are difficult to interpret either between or among ratings. Somehow we roll with that, though: it’s not very high-stakes, as they say.
  
  Obviously with VAM and teacher-evaluations and also, let’s not forget, kid’s Education, the stakes are very high.
  
  So I would say the real problem is in the end-user use of this “research” or whatever you want to call it. So the problem isn’t really in the statistics. Or in the investigation of these questions. But in the misuse of these data-manipulations.
  
  So… this is self-serving as an answer I suppose. But the reason is, I think, that (a) we want to know these answers, we want to study this stuff. It is righteous to want to investigate whether a kid’s been dealt a lousy teacher. The problem arises in the assertion of cause:effect to these questions. There’s not some inherent quasi-religious impossible-to-model Truth out there. But there definitely is some incorrect ways of attributing statistics to what’s going on.
  
  Am I making a mountain out of a molehill? I don’t think there’s a functional difference in these positions. At root permitting intimidation to substitute for skepticism is always a big, big problem.
- Máté Wierdl says:
  
  October 10, 2016 at 6:28 pm
  
  Not sure I understand. If wine VAMmer says
  
  “I have a very good way of objectively evaluating the quality of wines, the only wines which are worth qualifying: dry wines. My extremely complicated, but mathematically sound formula gave this $100 french cabernet sauvignon the highest score 5 and this $5 one received the lowest score of 1. So my quantitative method models people’s tastes in wines perfectly.”
  
  I say,
  
  “Well, I can’t stand dry wine. I only drink Muscat wine, but even that only if I must since I don’t care about alcoholic stuff. So you can guess, what score my so called personal taste formula would give to the $100 wine you mentioned.”
  
  Should I care about the details of the VAMmers method? Should I spend any time in understanding its mathematical correctness? Should I perhaps change my taste in wine to make the VAMmer happy?
  
  Similarly, VAMmers in education get all excited about standardized test scores, and they invent all kinds of complicated versions of their statistical methods, but all these versions still use standardized test scores.
  
  Mathematics is not magic, it is simply a formal description of our quantitative world—as Cathy explained in one of the videos as well. The correspondence between the formulas and the real world determines what scope the mathematical result will have in the real world.
  
  Mathematics is not alchemy: in cannot make gold out of crap.
- Máté Wierdl says:
  
  October 10, 2016 at 6:38 pm
  
  “It is righteous to want to investigate whether a kid’s been dealt a lousy teacher.”
  
  Yes. But that has nothing to do with looking at test scores, and it will be very difficult to convince me that mathematics will play a role in it. I think NPE already wrote down some ways of assessing teachers.
- redqueeninla says:
  
  October 10, 2016 at 11:06 pm
  
  check. well said.

retired teacher says:

October 8, 2016 at 11:54 am

Teachers should continue to fight against VAM, a fake “evaluation” system and all of its invalid assumptions. Here’s an interview from NPR with Cathy O’Neill in which she discusses how algorithms and false assumptions have been used to not hire or fire people. All the stack ranking is based on the assumption that the algorithm reflects the “truth,” and this is a false assumption of those brain washed by big data. http://www.npr.org/2016/09/12/493654950/weapons-of-math-destruction-outlines-dangers-of-relying-on-data-analytics

Laura H. Chapman says:

October 8, 2016 at 12:03 pm

I learned of the book from the author’s appearance on C-Span’s weekend attention to books, about 10 days ago.

I bought the book the next day and have recommended it to many others. The writing is clear, direct and the work of a gifted quant.

I wish I had been required to read this as a young teacher. My generation had “How to Lie with Statistics,” with great illustrations…. but nothing like this.

If I still worked in teacher education, I would make it required reading. Cathy O’Neal’s chapters on the use and abuse of statistics are timely, and not just for education.

Her blog is Math Babe.

You can find out why the blog has that name and how many activist causes she is launching or enlightening. I hope she will stay with this relatively new role of being a superb critic of “big data.” T

he worship of data, as if that should be the basis for every choice we make, is out of hand. Really scary is “sentiment analysis” based on the informal and formal conversations about finances or any other topic.

Scisne says:

October 8, 2016 at 1:54 pm

VAM has been “slammed” — quoting The Washington Post — by the very people who know the most about data measurement: The American Statistical Association (ASA). The findings of the ASA provide a firm basis by which every teacher who is unfavorably evaluated on students’ standardized test scores to vigorously oppose the evaluation, citing the ASA’s authoritative, detailed, seven-page VAM-slam “Statement on Using Value-Added Models for Educational Assessment”.

Even the anti-public school, anti-union Washington Post newspaper said this about the ASA Statement: “You can be certain that members of the American Statistical Association, the largest organization in the United States representing statisticians and related professionals, know a thing or two about data and measurement. The ASA just slammed the high-stakes ‘value-added method’ (VAM) of evaluating teachers that has been increasingly embraced in states as part of school-reform efforts. VAM purports to be able to take student standardized test scores and measure the ‘value’ a teacher adds to student learning through complicated formulas that can supposedly factor out all of the other influences and emerge with a valid assessment of how effective a particular teacher has been. THESE FORMULAS CAN’T ACTUALLY DO THIS (emphasis added) with sufficient reliability and validity, but school reformers have pushed this approach and now most states use VAM as part of teacher evaluations.”

The ASA Statement points out the following and many other failings of testing-based VAM:

“VAMs typically measure correlation, not causation: Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model.”

“Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions.”

“System-level conditions” include everything from overcrowded and underfunded classrooms to district-and site-level management of the schools and to student poverty.

A copy of the VAM-slamming ASA Statement should be posted on the union bulletin board at every school site throughout our nation and should be explained to every teacher by their union at individual site faculty meetings so that teachers are aware of what it says about how invalid it is to use standardized test results to evaluate teachers.

Fight back! Never, never, never give up!

Susan Lee Schwartz says:

October 8, 2016 at 1:57 pm

I do not wish to stray from this subject, but the whole idea of teacher evaluation is NOT tied to anything that actually deals with the need to discover the effectiveness of a teacher… which is not hard to determine , as millions of teachers over decades have been evaluated.

My case, and that of Lorna Stremcha’s in Montana (see her story link at end of this comment.) are two that lead to the real reason, which is a war on teachers in order to keep the budget low by removing pension and benefits, and TO CAUSE CATASTROPHIC FAILURE OF THE SCHOOLS so they EDUCATION CAN BE PRIVATIZED.

Lenny Isenberg, who writes about the fabrication of charges that took out TENS OF THOUSANDS of LA teachers, says this saves between 40k and 60k, for every teacher sent packing. http://www.opednews.com/Quicklink/LAUSD-OR-TARGETED-TEACHERS-in-Best_Web_OpEds-Deception_Evidence_Fired_Innocence-150720-360.html#comment555646

LORNA AND I experienced the assault on teachers in the nineties that preceded VAM, which was concocted to provide a legal excuse for the benefit of the public, to remove a teacher.

I was the NY State Educator of Excellence, and the NYC cohort for the Pew research on the Harvard thesis for the real National Standards, http://www.opednews.com/author/author40790.html and nothing stopped them from removing me from a practice which put my school on the map . Principals came and went and each tried to fabricate charges as the union ignored my grievances.
http://www.perdaily.com/2011/01/lausd-et-al-a-national-scandal-of-enormous-proportions-by-susan-lee-schwartz-part-1.html

One year, the SUPERINTENDENT of district 2 in NYC wrote a letter to say I had been found guilty of ‘corporal punishment’ , as I languished “in a teacher jail’, having been removed form my famous practice. For six months in that rubber room at the district office, I DID NOT KNOW why I had been sent there, or why my room was emptied of 8 years of my celebrated curriculum materials. No charges no hearing OR EVEN A LETTER explained my removal, as parents went wild, at my absence… I was famous in the city, and students came from all boroughs to be in my class. FOUR DECADES of excellent evaluations and awards (I wa sin Who’s Who Among America’s teachers for 4 of my years in that school…. and ALL of these EVALUATIONS were removed from my employment file and replaced by endless ‘documentation’ of my incompetence…even as Harvard filmed my Socratic Seminars.

They took out all the teachers in NYC so the schools could be labeled failing and THIS is the result: https://vimeo.com/41994760
VAM is a joke in NYC, because NO measures can stop the criminal assault directed at removing teachers.
Francesco Portelos i http://www.endteacherabuse.org/Portelos.html is presently, fighting t i n court.
Way back when it began they tried to jail him http://protectportelos.org/does-workplace-bullying-continues-my-33-hrs-behind-bars/.

He fought back, is now a SUBSTITUTE running from school to school, but this year heR AN FOR PRESIDENT of the UFT, as part of the movement to make TEACHER Unions fight for us again. GOOGLE HIM….and while you are there, look at David Pakter, who was a famous teacher at Fine Arts when it was his turn to be sent out the door..for bringing in a plant. Ten years later, he left teaching, after winning his case in court and spending half a million dollars to vindicate himself.
http://protectportelos.org/the-david-pakter-saga-an-all-too-familiar-of-a-story/

I hear the constant rhetoric and rant about VAM and testing, but NOWHERE, (not even on the Diane Ravitch site which is the GO TO site, if you want to follow the tragedy and travesty of the war on public education,) can you discover THE BEGINNING — th is the story of the 20 years PRECEDING t VAM — this process to cause schools to fail by removing the PROFESSIONAL practitioners.

IT ISN’T AS IF THE STORY IS NOT OUT THERE:
Here are some links that tell the tragic story
Karen Horwitz put this one up http://endteacherabuse.org and her story is in her bookhttp://www.whitechalkcrime.com
And there is this one on the RAMPANT ABUSE

Teacher Abuse is REAL AND RAMPANT, But More and More of Us Are Standing Up!

Betsy Combier, on several blog sites, nailed the NYC abuse… here is just one that gives the PRE_VAM methodology.
http://nycrubberroomreporter.blogspot.com/2009/03/gotcha-squad-and-new-york-city-rubber.html
and of course there is PerDaily.
http://www.perdaily.com/2015/01/were-you-terminated-or-forced-to-retire-from-lausd-based-on-fabricated-charges.html

The abuse that Lorna Stremcha faced http://www.greatfallstribune.com/story/life/my-montana/2016/03/18/educator-recounts-harassment-school/81896206/ is the METAPHOR for all of us, as principals with unlimited power did their thing; we see what happens when failed human beings discover that there not a shred of accountability for criminal behavior .
here she is in the video LAWLESS:

Because of the union’s failure as her CIVIL RIGHTS ADVOCATE,
http://nycrubberroomreporter.blogspot.com/2013/10/lorna-stremcha-and-her-rubber-room.html Lorna had to go to court, an expensive, long ordeal to prove that this principal set her up to be raped! http://nycrubberroomreporter.blogspot.com/2013/10/lorna-stremcha-and-her-rubber-room.html

Get her book, when you think about VAM, because this is how it started: when critters like Donald Trump, got to run a school, KNOWING THAT THERE WAS NO ACCOUNTABILITY FOR ANYTHING!
Bravery, Bullies and Blowhardshttps://www.facebook.com/lessonslearnedinamontanaclassroom/photos/np.1475506601992954.1476611960/1777867745834429/?type=3&theater

artseagal says:

October 8, 2016 at 3:54 pm

I sat in on a staff meeting where one of the number crunchers for VAM addressed teachers at my school on the subject of our process for scoring our students to show our own teacher “growth”. He flat out told us that we better not show growth with all of our students we selected because if we did, it would only bring everyone of us teachers down! Pure math genius? Hardly! So now, we are not supposed to look “too good” because of an artificial scale that has to show “X” percent of teachers on the bottom of the ranking, “X” percent in the mid ranking and “X” percent on top of the rankings. So in effect this “genius” voice of the administration was telling us not to allow some students to “grow”! What utter and sheer nonsense. Insult to injury is that this individual is paid a hefty salary and even was able to come with 5 of his staff members to our meeting (he was the only one that spoke). Bloated administration expounded bloated nonsense! Huh???? Buffoonery.

KrazyTA says:

October 8, 2016 at 5:53 pm

I have posted this before but it bears repeating.

A devastating takedown of rheephorm’s “multiple measures” spin when it comes to VAM and its noxious kin.

From Audrey Amrein-Beardsley, RETHINKING VALUE-ADDED MODELS IN EDUCATION: CRITICAL PERSPECTIVES ON TESTS AND ASSESSMENT-BASED ACCOUNTABILITY (2014, pp. 44-45):

[start excerpts]

These and other HISD teachers also noted that their supervisors were skewing their observational scores to match their value-added scores given external pressures to do so. …

One teacher stated: Here’s the problem. No principal wants to be called in by the superintendent or another superior and [asked]: “How come your teachers show negative growth but you have high evaluations on them? Are you doing your job? I don’t understand. Your teacher shows no growth but you have [marked them] as exceeding expectations all up and down the chart?” Now it’s not just this [sic] data over here that’s gonna harm us, it’s the principals [who are] adjusting our data over there to match the EVAAS®. So it looks like they’re being consistent. …

Another teacher agreed: “Well my evaluations were fine, but of course now they have to make the evaluation match the EVAAS®. We now have to go through that”…

Another teacher wrote: They’re not about to go to bat [for us, although] a few of them will. But most of them are going to go in there, and they’re going to create a teacher evaluation that reflects the [EVAAS®] data because they don’t want to have to explain, again and again, why they’re giving high classroom observation assessments when the data shows [sic] that the teacher is low performing. …

Another noted: Our principal pressures us. You bet she pressures us. If you don’t [make EVAAS®], then it goes against you in your PDAS. In a roundabout way she finds a way to put that against you. …

Another noted: My boss had to go to the district superintendent and explain why we needed to be kept, when ultimately the data showed that we weren’t good teachers. … [However] you’ve got other good teachers who are being thrown under the bus because of this system.

In Collins (2012), HISD teachers also described how principals would switch their PDAS scores to match their EVAAS® scores if dissimilar, mainly because they believed their administrators held the opinion that the EVAAS® estimates were superior and should trump the more subjective PDAS scores. …

[end excerpts]

Of course, this is what happens on planet Earth aka the real world where Campbell’s Law holds sway. On RheeWorld—where the heavyweights and enforcers of corporate education reform live—things look very very different because of the Rheeality Distortion Fields that they inflict mercilessly on one another…

Really! and Rheeally!

😎

P.S. HISD = Houston Independent School District. EVAAS® = Education Value-Added Assessment System. PDAS: Professional Development and Appraisal System.

wondering says:

October 8, 2016 at 9:52 pm

This is an outstanding book. Has enormous implications for other parts of education today as well. This includes personalized learning (i.e., when algorithms based on lots of data assign instruction, not human teachers) – and even worse, when those mathematical models direct people to certain classes or pathways. Also, what about when those models collect a lot of social and emotional data that track with a student? One of the major points of the book is that these models more often than not make inequities worse. It is a must read.

bethree5 says:

October 9, 2016 at 12:10 am

O’Neil’s article clearly calls out to math-brained folk to do a reality check. Kudos.

I have a queasy feeling– correct me if I’m wrong– that VAM is supported not just by folk who think teachers should be held ‘accountable’– but that these & other folk buy into edumetrics because they lack trust in their own common sense: they are buffaloed by a combination of believing in numbers while mistrusting their own math ability, thus are vulnerable to being lied to by statistics.

It must be admitted that US mainstream folks are not particularly math-savvy. It is only our top 10% or so who excel in Math. I glean this not just from PISA scores, but from personal acquaintance w/many Indian & Chinese folk whose kids arrive here w/far more sophisticated grasp of number-sense than natives. (I like to attribute that not just to parental culture pushing STEM, but to hands-on abacus starting earlier than 3yo.)

Just want to point out in our favor that in our best states/ districts, even those w/little native math ability who were not encouraged to build on logic/ number sense at an early age– those taught to question & think critically from an early age & throughout their education are not so easily swayed by statistics used to bolster political agenda.

wgersen says:

October 9, 2016 at 5:39 am

The Mathbabe (aka Cathy O’Neil) is my second favorite blogger…

Máté Wierdl says:

October 9, 2016 at 7:43 am

Now this is a much better way of looking at VAM. It gives one good, elementary explanation why VAM, as a mathematical method, cannot be used to evaluate anything. The explanation is as good as the technical ones by various statistician groups.

I still think, though, there is a more fundamental problem with VAM: before we evaluate VAM as a mathematical-statistical method, we need to look at what it is trying to work with—namely, it uses standardized test scores—and we can and should safely dismiss it simply based on that.

October 9, 2016 at 8:02 am

The original post links to an explanation of how it was detected that VAM scores are really random, but the link got lost from Diane’s post. Here it is

The Value Added Teacher Model Sucks

but the explanation is not very clear, imo, since the set up of the score chart is not explained. The clearer (because somewhat more detailed) explanation is in the original post of Gary Rubinstein at

http://garyrubinstein.teachforus.org/2012/02/28/analyzing-released-nyc-value-added-data-part-2/

He gives 5 other examples in 5 different posts for how analysis of VAM data can detect problems either with VAM itself as a mathematical method or with charter schools’ obtaining ‘better’ students.

October 9, 2016 at 8:36 am

I suspect that Gary Rubinstein’s post may still be too technical to go through for quite a few people, since the exact details are necessarily left out to make the explanation accessible to a wider audience. So I thought I give a little, simple but detailed example to have a concrete analogy to think about when somebody says, “VAM behaves like a random number generator, hence it is useless as a mathematical-statistical tool to evaluate education”.

To those who have understood all along what “VAM behaves like a random number generator” means, I apologize.

Saying that VAM scores appear as if they were generated randomly can be morally understood via the following example: take any set of teachers—for the sake of our example, just four teachers A, B, C, D. Now pick 4 numbers between 1 and 10 in any way you want, and write them next to the teachers, like

A 3
B 7
C 2
D 4

Now, if these were VAM scores, and a teacher gets fired if her score is 5 or below, then A, C and D would be fired. Now saying that VAM scoring behaves like a random number generator when scoring these 4 teachers means that VAM assigns numbers between 1 and 10 as arbitrarily to the teachers A, B, C, D as we just did. In other words, the VAM scores have absolutely nothing to do with who A, B, C, D are or how they teach.

With such arbitrary (random) scoring, you expect that half of the numbers are 5 or below, hence half of the teachers get fired—if 5 is the cut score for firing.

Using the above random scoring method to fire teachers is equivalent with picking 2 teachers arbitrarily out of the 4 and firing them.

Sounds scientific, doesn’t it?

Now detecting VAM’s random scoring is a bit more complicated, but in the cleverly picked settings in Gary’s posts it is as plain as in the example above.

October 9, 2016 at 9:18 am

Finally (sorry for all the posts), we can illustrate the “before VAM” and “after VAM” employee management eras as follows:

Before the VAMming era, a CEO tells the director of HR something like

“John, we need to fire half of our employees. So go ahead, pick 50 out of our 100 employees you think should be fired, and lead them out of the building. Lemme know how it went. Here is a bonus check for $100K for helping us trough these difficult times. Ah, and, on your way out, please tell Stacy to make me a cup of coffee. “

In the modern era of VAMming, the CEO would say this to the director of HR

John, we need to fire about half of our employees. So go ahead, pick any number between 40 and 60, then pick that many of our 100 employees as arbitrarily as you can, tell them they got VAMmed and lead them out of the building. Lemme know how it went. Here is a bonus check for $100K for helping us trough these difficult times. Ah, and, on your way out, please tell Stacy to bring me a decaf grande latte from Starbucks . “

So you see, the VAMming method is a bit more complicated but it’s worth using it, since it’s scientific, and research shows that the best employees are kept to ensure steady and rapid growth and hence success.

redqueeninla says:

October 10, 2016 at 1:59 am

lol.

See? How can you claim data “irrelevant”…? It supports paying more for coffee?!

October 10, 2016 at 3:25 am

So I also had some problems with this linked blogpost. In fact I think she’s saying precisely what I was trying to, that VAM’s lack of accuracy isn’t the issue so much as the fallout from it. Dr Wierdl is saying, basically: GIGO; Garbage In; Garbage Out; why are we discussing validity and fallout when what even comprises these sequelae has no standing?

I would rather see Dr O’Neill’s blogpost stated more mathematically – start with a curve that holds ranking information in it (“you’re number 1, you’re number 32, etc), adding additional information (teacher’s “other” assessments) when these are pretty unvarying, won’t change the overall ranking in that original curve. It’s like adding a scalar to a curve; it just shifts the whole thing up/down but doesn’t change the shape.

But I think she explains the bigger point of mathematical intimidation better here: http://www.youtube.com/watch?v=gdCJYsKlX_Y

And here: http://www.youtube.com/watch?v=cK87rN4xpqA . As she states, this is a social justice issue, not a math one or even a process one. Personally, I agree completely though I think there’s plenty of room for attacking this mountain at root, slope and peak.

In thinking about intelligence scores, ranking, democracy and meritocratic education I highly recommend Lemann’s inadequately titled book “The Big Test”. It’s a historical perspective of what I am guessing is something of the same story O’Neill takes up downstream.

Cathy O’Neil: The Fallacy of VAM as “One of Multiple Measures”

28 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats

Cathy O’Neil: The Fallacy of VAM as “One of Multiple Measures”

Share this:

28 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Blog Topics

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats