Jason Stanford is a journalist in Austin, Texas, who follows the testing wars with keen interest, probably because he has children in school.
Having followed the blowback in Texas, where parents and educators together convinced the legislature that their zeal for testing was unreasonable, Stanford decided that standardized testing is not a good way to hold schools accountable. Actually, he says it is a lousy way because the tests don’t measure what we think they measure.
He writes:
“We’ve been using bubble tests to hold schools and students accountable for a long time, mostly without anyone asking tough questions about whether the scores were valid measures. Controversy over student testing was slow to develop and then mostly concerned the number of tests and the harsh consequences. We never asked whether the thermometer really measured the temperature, even though our education system is based upon the validity of these tests.”
He refers to “value-added measurement” as junk science. It is easy to parody as “Orwell-Meets-Kafka,” especially when the government does absurd things like ranking teachers by the scores of children they never taught.
This was the key point for him: “What really seemed to shake things up was an April report by the American Statistical Association, which said that because VAMs were based only on standardized tests they were 10 pounds of hooey in a 5-pound bag. And if you’re inclined to want the details, here’s the phrase that pays: “Most VAM studies find that teachers account for about 1 to 14 percent of the variability in test scores.”
And that led him to ask: “If teachers only account for 1 to 14 percent of the change in test scores, then what does the other 86 percent measure?”
“And if we don’t know what it means, why are we holding schools, students, and teachers accountable to it?”

We seem to let the conversation devolve to evaluating teachers and holding schools “accountable”.
Jason is right that tests don’t measure what we think they measure.
In all the great debates, we never begin with what we want schools to be accountable for? Or what do we want our graduates to know, be able to do, to be like?
Since there are no reasonable measures of the things that matter, what can be measured gets measured. Heck of a bad way to do science. And worst of all, what matters gets lost.
LikeLike
Everything that can be counted does not necessarily count; everything that counts cannot necessarily be counted.
Albert Einstein
LikeLike
In a nutshell: “You can’t count on counts”
LikeLike
I like this quote!
LikeLike
Did Jason Stanford miss the point? Using the results of standardized tests to rank and yank teachers and close public schools is here not because it is a valid way to measure the quality of a teacher, but because Bill Gates and the corporate education reform movement want to use it as a way to destroy democratic public education on the road to profit and control of how kids are taught and who is taught.
Once we define the real reason for the tests, then they become valid for that reason.
LikeLike
Bingo!
LikeLike
“Actually, he says it is a lousy way because the tests don’t measure what we think they measure.”
It’s not just lousy but CHOCK FULL OF ERRORS, COMPLETELY INVALID, UNETHICAL AND IMMORAL for all the harm it does to the most innocent of society, the children.
The errors and invalidities have been known since at least 1997 when Noel Wilson published is never refuted nor rebutted dissertation. To understand why those educational malpractices that are educational standards and standardized testing are so read and understand his “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
LikeLike
Here is a list of the 170 scheduled tests at Dublin City Schools near Columbus, OH.
http://www.jointhefuture.org/1439-you-won-t-believe-the-schedule-of-tests-for-this-ohio-school-district
LikeLike
The ELA tests are the worst. More than anything, they test familiarity with certain minimally-helpful reading strategies (i.e. scanning titles and subtitles to aid comprehension) rather than comprehension itself. And to the extent they measure comprehension, they are terrible measures of a ELA teacher’s effectiveness since reading comprehension is a function of general knowledge, of which an ELA teacher can only be the source of 5% or so at most. This is the Kafka-esque part. And the fact that the ELA tests, per NCLB, count for 40% or so of a school’s API score has led schools to jettison history and science, two subjects that actualy increase reading comprehension ability through the imparting of world and word knowledge! Instead of learning these fruitful and interesting subjects, kids are forced into three hour literacy blocks” where they practice unproven strategies to boost reading comprehension –answering dreary questions about third-rate, mentally malnourishing fiction. Thus the ELA tests fail to test much important mental value-added, perversely punish and reward teachers who can barely affect scores on these tests, and induce schools to narrow curricula in a way that starves our kids’ brains of an interesting and mentally-nourishing education
English teachers –wake up and protest! Recognize that “literacy block” is a sterile mutant version of your venerable discipline. You should be teaching literature and grammar, not having kids practice reading strategies ad nauseum. And you need to understand that history, science, art and other teachers are building reading comprehension alongside you NOT by adopting these mutant literacy strategies, but by imparting discipline-specific content. This content is an account of how the world works. Texts are about the world. Knowing the words that name the things of the world is THE key to comprehending written texts Cockamamie theories from careerist education school “scholars” has obfuscated this common-sensical fact.
LikeLike
Ponderosa
Stay on your soapbox and let it shine.
Here’s what it feels like to many students who lack background information or understanding. All the critical thinking skills in the world do you no good without a foundation of content knowledge.
A pair of centuries and some late wickets put South Africa in a strong position with Australia 4 for 112 at stumps on day two of the second Test in Port Elizabeth. South Africa was bowled out early in the final session for 423, after AB de Villiers (116) and JP Duminy (123) both ground out tough, vital centuries for the home side. Nathan Lyon finished with 5 for 130 after bowling tirelessly all day, while Australia’s fast bowlers uncharacteristically struggled on a lifeless pitch. Wayne Parnell’s (2 for 19) first three balls featured the wickets of Doolan and Marsh, as the left-armer made the most of his Test recall. Parnell coerced edges out of the Australia pair with fine line-and-length bowling, needing only a fraction of movement to earn the scalps. Warner and nightwatchman Nathan Lyon (12 not out) faced a number of close scares to reach stumps unbroken. De Villiers grassed a regulation chance behind the stumps when Warner was on 39, while Lyon was also dropped by the usually safe hands of Duminy and given not out when replays proved he nicked one behind to the keeper.
1) Which best describes the “fraction of movement” needed to earn scalps in this cricket match?
a) nicking behind the keeper
b) edging out a lifeless pitch
c) breaking fine-line wickets
d) stumping vital centuries
LikeLike
That’s one wicket question.
I think I’ll stick to croquet (or maybe crochet)
LikeLike