Last Friday, before the winter break, D.C. officials quietly released the news that the D.C. IMPACT evaluation system contained technical errors. It was the perfect time to reveal an embarrassing event, hoping no one would notice. Spokesmen minimized the importance of the errors, saying they affected “only” 44 teachers, one of whom was wrongfully terminated.
But Professor Audrey Amrein-Beardsley explains that what happened was “a major glitch,” not a “minor glitch.” It was not a one-time issue, but an integral part of a deeply flawed method of evaluating teachers. No amount of tinkering can overcome the fundamental flaws built into value-added measurement of teacher quality.
Beardsley writes:
VAM formulas are certainly “subject to error,” and they are subject to error always, across the board, for teachers in general as well as the 470 DC public school teachers with value-added scores based on student test scores. Put more accurately, just over 10% (n=470) of all DC teachers (n=4,000) were evaluated using their students’ test scores, which is even less than the 83% mentioned above. And for about 10% of these teachers (n=44), calculation errors were found.
This is not a “minor glitch” as written into a recent Huffington Post article covering the same story, which positions the teachers’ unions as almost irrational for “slamming the school system for the mistake and raising broader questions about the system.” It is a major glitch caused both by inappropriate “weightings” of teachers’ administrator’ and master educators’ observational scores, as well as “a small technical error” that directly impacted the teachers’ value-added calculations. It is a major glitch with major implications about which others, including not just those from the unions but many (e.g., 90%) from the research community, are concerned. It is a major glitch that does warrant additional cause about this AND all of the other statistical and other errors inherent not mentioned but prevalent in all value-added scores (e.g., the errors always found in large-scale standardized tests particularly given their non-equivalent scales, the errors caused by missing data, the errors caused by small class sizes, the errors caused by summer learning loss/gains, the errors caused by other teachers’ simultaneous and carry over effects, the errors caused by parental and peer effects [see also this recent post about these], etc.).
The “errors” cannot be corrected because the method itself is the problem. The errors and flaws are integral to the method. VAM is Junk Science, the use of numbers to intimidate the innumerate, the use of data to quantify the unmeasurable.
VAM belongs with Reagan’s comment when he was governor of California that the red ward trees caused air pollution–in the junk heap of false science.
Ketchup is a vegetable!
Redwood.
If VAM is so demonstrably flawed, but there is so much hush money covering that fact up that it cannot be used to kill the method sensibly, then we are careening towards class action lawsuits.
I have said it before (with a lot of regret): We are at the point where facts no longer matter — only Power does.
Thank you Ron. It’s the only language that EVERYBODY can interpret!
ED
For those of you who might not understand why these scores are invalid, let me explain it as I understand it:
Let’s say you are a teacher in Washington, DC and most of your sixth grade students are at a second and third grade level in reading and mathematics. The standardized test that they take is designed to test achievement at a sixth grade level so there are very few items on the test to measure the progress of these students. At the end of the year, even if the children make one or two years progress, it will not show up on a standardized test. It might look as though the students have learned nothing (unless they were drilled on actual test items, which of course invalidates the results).
Also, these tests are not designed to differentiate between school and out-of-school learning, so the privileged child who is read to each night is likely to score higher than the child who does not have this daily experience. This is the main reason why standardized tests of all kinds (including the SAT) correlate closely with the socioeconomic background of the person tested.
This does not mean that the progress of a child cannot be tested. Of course, it can, but the test must be geared to the individual achievement level of the student and should be individually administered and interpreted (e.g. “Maria was absent from school for four months, due to illness.”)
Yes, VAM is junk science and I’ll bet most of the advocates know it too. These people are not stupid.
“For those of you who might not understand why these scores are invalid, let me explain it as I understand it:”
It’s posted below. Epistemologically and ontologically the whole process is false.
Many years ago, my 3rd grade class consisted of 7 Special needs students & 7 ELL students. In addition, I had 7 ” on grade level” students. I went to my building principal & said, “”This class is not balanced at all.” his reply was “Letters already went home to parents. Deal,with it.” If my teacher evaluation was based on test scores, Imwould have been fired. What I’m wondering is if the administrators, in the buildings where teachers are receiving bad evaluations based on student test scores, are also being fired? Every one of them should be. That would be as fair as the reality of what is going on!
An excellent example of Campbell’s Law. Duncan and his buddies really don’t care They are making $$$$$.
If I may change the thought a little but so it still holds true.
“No amount of tinkering can overcome the fundamental flaws built into EDUCATIONAL STANDARDS AND THE CONCOMITANT STANDARDIZED TESTING that purportedly provides a measurement of STUDENT LEARNING. . . . The ‘errors’ cannot be corrected because the method itself is the problem. The errors and flaws are integral to the method AND ANY RESULTING CONCLUSIONS ARE COMPLETELY INVALID.”
All those “errors and flaws” that cause the invalidities (yes, plural) are laid out brilliantly by Noel Wilson in “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Gonna need help from the Quixotic Quest Wagoneers to keep getting the word out over the next week as I will be down at the Eleven Point river camping, fishing and floating. This quixotic subspecies of bear has to get his salmonidae nourishment whenever he can, boy does teaching get in the way of that-ha-ha. Can I rely on you to keep spreading the quest?
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
1. A quality cannot be quantified. Quantity is a sub-category of quality. It is illogical to judge/assess a whole category by only a part (sub-category) of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as one dimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing we are lacking much information about said interactions.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. As a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it measures “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
As they say in the tech world, “It’s not a bug. It’s a feature.”
Linda Johnson: the “errors” are built in, partly for the reasons you mention. And they are fatal and toxic.
Let me include just one other. There have been a number of comments and a few postings on this blog detailing what happens to the many educators that do not teach the subjects involved in high-stakes standardized testing. They are forced, or choose, to cast their lot in with specific subject areas and teachers. Thus the students of certain teachers generate test scores that not only determine whether those particular educators are “effective” or “ineffective” or “need improvement” but whether or not other teachers who didn’t teach those students those subject areas get those labels as well.
How can this not be described as doubling down on a punitive hazing ritual whose main rationale is summed up in the old standard—
“The beatings will continue until morale improves.”
And that doesn’t even include Duane Swacker’s more fundamental considerations. Please look at his extended comments below.
Please excuse, however, a little quibble about your last paragraph. I for one don’t think the vast majority of self-styled “education reformers” have a clue about standardized testing, including whether it does anything at all except measure how well you took a particular test on a particular day under particular circumstances. It is not just being innumerate [the mathematical equivalent of illiterate] but being studiously and firmly ignorant of the conceptual framework and math behind standardized testing.
For example, I refer you to Secretary of Education Arne Duncan’s speech before the American Educational Research Association of April 30, 2013. He is somewhat for/somewhat against/somewhat for & against standardized testing. It is a cognitive mess. He was literally lambasting his critics for the high-stakes standardized testing he was responsible for forcing on millions of others—a typical kiss up-kick down speech, where he satisfied the leading charterites/privatizers while he blamed the ‘little people’ for not doing their jobs properly and screwing up the already screwed-up testing process.
“I’m the Sec of Ed, short and stout, my mind’s so open, that my brains fell out.”
Link: http://www.ed.gov/news/speeches/choosing-right-battles-remarks-and-conversation
And since the edufrauds and the edubullies don’t apply VAManiacal-like measures to themselves, they are quite comfortable massaging data, torturing numbers, and punishing those who are on the front lines of teaching.
Remembering always that the brunt of high-stakes standardized testing is for OTHER PEOPLE’S CHILDREN. There’s Harpeth Hall and Cranbrook and Lakeside School and U of Chicago Lab Schools and Waldorf School of the Pacific and the like for THEIR OWN CHILDREN.
Could I be exaggerating? Let’s just take one teeny weeny example from Cranbrook:
“The Summer Theatre School, our oldest summer program, presents classic theater skills like character acting, lighting, dance, voice, costuming, set design and other stage crafts. The Theatre School operates from Cranbrook’s beautiful Greek Theater grove, an outstanding full sized stone replica of a classic outdoor Greek theater setting nestled in a mature pine forest. Evening outdoor theater productions attract ample crowds from neighboring communities.”
Link: http://schools.cranbrook.edu/podium/default.aspx?t=146451&rc=0
But I am sure that OTHER PEOPLE’S CHILDREN will be happier with their bubble-ins generated by Common Core pre-tests and tests and post-tests. A “full sized stone replica of a classic outdoor Greek theater setting nestled in a mature pine forest” — c’mon, we’re talking learning cage busting achievement gap crushing skills and abilities for the twenty first century!
😎
My bad: “Please look at his extended comments below.” should read
“Please look at his extended comments above.”
😎
Reblogged this on peakmemory and commented:
This says it all:
“VAM is Junk Science”
This says it all for me…”VAM is Junk Science, the use of numbers to intimidate the innumerate, the use of data to quantify the unmeasurable…”
When criminals on Wall St. get bonuses despite tanking the economy in the name if greed, I wonder if the teacher who lost his/her job from this horrific fiasco is going to be hired back with damages paid?
Gee only one teacher was wrongfully fired! And that makes it okay for the Rheeformers? Even if the teacher gets their job back, it is still unacceptable, especially since teachers do it have trust funds to fall back on.
Alice,
Agreed. The whole system is rotten to the Core