Audrey Beardsley, on her blog “Vamboozled,” notes that Tom Kane, Harvard economist and leader of the Gates Foundation’s $45 million Measures of Effective Teaching, has returned to the hustings to argue on behalf of the causal value of value-added measurement, that is, the idea that teachers directly “cause” the test score gains of students. For VAM to work, she argues, students would have to be randomly assigned, and they almost never are. Even “random assignment” might not truly be random, because teachers would still face vastly different classes, some with many highly motivated students and others with many unmotivated students, as well as a host of other unmeasured variables. Imagine sitting at a poker table, and the dealer randomly assigns cards. One person has a royal flush, another has a hand without even a pair. The assignment was “random,” but the cards dealt were very different.
Beardsley writes:
Kane, like other VAM statisticians, tend to (and in many ways have to if they are to continue with their VAM work, despite “the issues”) (over)simplify the serious complexities that come about when random assignment of students to classrooms (and teachers to classrooms) is neither feasible, nor realistic, or outright opposed (as was also clearly evidenced in the above article by 98% of educators, see again here).
The random assignment of students to classrooms (and teachers to classrooms) very rarely happens. Rather, the use of many observable and unobservable variables are used to make such classroom placement decisions, and these variables go well beyond whether students are eligible for free-and-reduced lunches or are English-language learners.
Ah, if only the real world were as tidy as many economists would like it to be.

Audrey and some colleagues have exposed the fraud of economists who think that they can just tinker around enough to make the VAM thingy sound plausible.
She and some colleagues went into schools and talked with principals who thought the whole of idea of random assignments was a crock. That is my language and my take on that peer-reviewed study.
The companion fraud is the SLO, a convoluted writing assignment for almost all teachers who do not produce scores that are needed for VAM. The SLO fraud has been perpetuated since 1999 by William J. Slotnick who first introduced Denver teachers to his warmed over version of Peter Drucker’s 1954 notion of management-by-objectives (MBO).
MBO was a fad in the business world for about two decades before CEOs realized that the scheme rewarded the most competitive personnel and those who learned to game the system.
Slotnick is the darling of all fans of pay-for-performance in the same way that Thomas Kane is.
Example, the State of Maryland is the most recent buyer of Slotnick’s snake oil–SLOs for everyone–with WestEd awarded a contract for the oversight.
And, as Mercedes Schneider has helped to reveal, the Senate draft for the reauthorization of NCLB/ESEA is not letting go of this failed concept of pay-for performance, documented by the US Department of Education’s own research.
So the senators and their their staffs are totally out of the loop of the Department of Education’s “scientifically based” research on the failure of pay-for performance, in addition to abundant research on this subject.
Our elected officials and their staff and the lobbyists who draft the reauthorization of NCLB/ESEA pay homage to the concept of making decisions based on data and scientifically-based research–for all of us out here in education-land–but not in their own chambers. This is another case of inexcusible their well-documented ignorance made worse by the pretense of caring about “scientifically based” research.
LikeLike
Until the TESTING companies, say these test measure Teacher and School effectiveness they should not be used for that purpose. Duane has explained this over and over. The TEST must measure what they say it measures.
LikeLike
AL,
But those tests do not measure anything. They are not “measuring” devices as there are no agreed upon “measurement standards” that have been ever been developed in education (because they are not amenable to measurement). They supposedly assess a change in the students’ learning but are a piss poor substitute. Construct validity and many other epistemological and ontological errors render any decisions COMPLETELY INVALID. This really isn’t rocket science folks although the VAManiacs and SLOthinkers would like you to believe so.
And the fact of using a test for any purpose other than it was designed is COMPLETELY UNETHICAL.
LikeLike
Agree. You say it best!
LikeLike
Duane,
Economists love to use things for purposes for which they were not designed. VAM for teacher evaluation is a perfect example.
And ethics are not something many economists concern themselves with..
Though I am pretty sure you have probably noticed both of these things.
LikeLike
Poet,
It’s worse than no ethics. George Washington University Professor, Amitai Etzioni describes the situation in, “The Moral Ill Effects of Teaching Economics”.
LikeLike
I guess we now know how “serious” Gates was about his call for a two year moratorium on VAM.
Obviously, all he was after was more time for his well-paid help to play with his toy model.
If people like Kane — and his colleague, Chetty — want to know why economics is not considered a science by real scientists, all they need to do is look in the mirror.
(see Economics could be a Science if More Economists were Scientists)
LikeLike
Well, there you go quoting W. Black. What would he know? Google and find out how many crooks he had put in jail in the S&L debacle. And then compare that to how many banker crooks E. Holder has even charged. It’s called accounting control fraud and we know who the thieves are, but Obomber’s boy Holder hasn’t touched a one.
LikeLike
“Ah, if only the real world were as tidy as many economists would like it to be.”
so true.
LikeLike
“VAMdoo”
Believe you me, the VAMdoo’s real
I just need one more pin
And you will see, what is the deal
Aft I stick that one in
LikeLike
Idle hands are at Tom’s VAMpire workshops. His laboratory at the Harvard Castlevania is too busy to work on futile experiment research funded by the Gates Foundation.
LikeLike
VAM Algorithm – Similarly created and featured in ‘A Beautiful Mind’.
It is NOT Real!
It is NOT Real!
Delusions!
NOT REAL!
LikeLike
If I may correct your last statement, Diane:
“Ah, if only the real world were as tidy as many economists IMAGINE it to be.”
LikeLike
Folks,
Again, to understand the COMPLETE INVALIDITIES of the educational standards and standardized testing regime and even the COMPLETE INVALIDITY of student grades read and understand Noel Wilson’s never refuted nor rebutted treatise “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
LikeLike
Data salad.
LikeLike
Reblogged this on David R. Taylor-Thoughts on Texas Education.
LikeLike
“Dirty VAMs, done dirt cheap” (from the AC/DC song)
If you got a teacher and you want her gone
But you ain’t got the guts
She keeps kids laughin’ evry single day
Enough to drive you nuts
If yer in a jam, pick up a VAM
It’s time you made a stand
For a fee, I’m happy to be
Your junk-stats man, hey
Dirty VAMs done dirt cheap
Dirty VAMs done dirt cheap
Dirty VAMs done dirt cheap
Dirty VAMs and they’re done dirt cheap, yeah
Dirty VAMs and they’re done dirt cheap
Dirty VAMs and they’re done dirt cheap
Dirty VAMs and they’re done dirt cheap
Concrete apples
VAM-n-hide
T.F.A.
Done dirt cheap
Oooh, Gates-ties
Contracts
High doltage
Done dirt cheap, yeah
(“Dirt cheap” = only $45 million)
LikeLike
TFA, High doltage
🙂
LikeLike
“The night they drove Statricksy down” (parody of “The night they drove Old Dixie down)
Thomas Kane is my name and I drove on the VAMville train
‘Til Audrey Beardsley came and tore up the tracks again.
After the ASA* paper knife , we were hungry, just barely alive.**
By twenty-fourteen, Rich man had fell.
It’s a time I remember, oh so well.
(*American Statistical Association, **only had $ 45 million from the “Rich man”, Bill Gates)
The night they drove statricksy down
And all the bells were ringing,
The night they drove statricksy down
And all the people were singing
They went, “Na,na,na.na,
Na na na na na na na na na.”
Back with my colleague, Raj Chet-ty, when one day he called to me,
“Thomas, quick, come see, there goes the Gastesly Billee!”
Now I don’t mind I’m choppin’ stats, and I don’t care if I’m paid by the brats
You take what you need and leave the rest,
But they should never have taken the VAMmy best.
The night they drove statricksy down
And all the bells were ringing,
The night they drove statricksy down
And all the people were singing
They went, “Na,na,na.na,
Na na na na na na na na na.”
Like my “Father”*** before me, I will work the VAM
And like my colleague before me, I took a junk-stat stand.
He was just 34, proud and brave,
but the ASA put him in his grave.
I swear by the mud below my feet
You can’t raise a Kane back up when he’s in defeat
(***Eric Hanushek, Father of VAM)
The night they drove statricksy down
And all the bells were ringing,
The night they drove statricksy down
And all the people were singing
They went, “Na,na,na.na,
Na na na na na na na na na.”
The night they drove statricksy down
And all the bells were ringing,
The night they drove statricksy down
And all the people were singing
They went, “Na,na,na.na,
Na na na na na na na na na.”
LikeLike
“Now I don’t mind I’m choppin’ stats”
Having heard those to songs quite a bit, although maybe as not as many as Reelin in the years, I really enjoyed both of those.
LikeLike
This debate surely refers all the way back to perennial arguments about social vs ‘pure’-and-applied sciences. Educators have been thinking about this at least as early as the 70s, when ethnographer Jean Lave’s tailor apprenticeship studies influenced John Seely-Brown’s Institute for Research on Learning (IRL). Adopting ethnography as its main research method the Institute forged new understandings of how individuals enter and join learning communities, achieve acceptance, then themselves grow and evolve as vessels of community knowledge (Lave & Wenger, 1991) (Lave, 1996). They brought Clifford Geertz’s “thick description” (1973, Thick Description: Toward an Interpretive Theory of Culture. In The Interpretation of Cultures. Pp. 3-30. New York: Basic Books.) to the observation of learning environments.
A standardized test, even were it not further restricted in value by the needs of digital mass delivery and evaluation (e.g., multiple choice, bubble tests) can never be a thick description of anything, and at best is nothing but a snapshot of a person’s ability to score on a particular test on a specific date—what they can’t measure in any meaningful way is “learning,” which requires retaining information or skills over time and distance, and then transferring its application to the solving of new problems, in other contexts. Can any bubble test ever designed accomplish that?
LikeLike
NO!
See above referenced Wilson’s work to understand why.
LikeLike