Martin Carnoy of Stanford is one of the nation’s leading authorities on international assessments. In this publication, he explains how complicated it is to draw meaningful conclusions from them.
Martin Carnoy of Stanford is one of the nation’s leading authorities on international assessments. In this publication, he explains how complicated it is to draw meaningful conclusions from them.

Very illuminating. Unfortunately a bit too long and technical for the politicians!
LikeLike
What you need to know about international tests: {}
LikeLike
Or as Edwin Starr sang about war: “Absolutely nuthin”!
LikeLike
At minimum, read the executive summary and note that the Common Core State Standards were sold as if they were “internationally benchmarked.” They were not, and if they had been, the process would have absolutely no credible policy implications for the US.
Unfortunately, this otherwise excellent report on the farce of being fixated on scores based international tests ends with a recommendation for test score comparisons among our states.
“Recommendation: In the United States, policymakers should shift focus away from why students in other countries may do “better” than U.S. students as a whole and instead focus on why student achievement gains have been greater in some U.S. states and lower in others.”
This recommendation puzzles me. NAEP scores provide an indication of “gains” over time. Perhaps the author is affirming that the Common Core State Standards and tests based on those should be the enforced so that states can be compared on “achievement gains.” I think that sort of testing is a lost cause It is a lost cause because there has been no stability in state policies and there is unmapped territory between policy and practice, especially with proliferations of service providers for interventions galore. And there is not much evidence that the fundamental role of poverty will be addressed with any nuance.
I think it that “why states differ” is also the wrong question because it does not seem to interrogate the very concepts of “achievement” and “gains in achievement.” As long as these concepts are reduced to test scores, we miss the opportunity to think about what education is for, and should accomplish for our nation, and communities, and each child. Of what value is it to anyone to know that a kid can score high on a test of reading comprehension and also “hate” reading for pleasure?
The professionals in measurement of educational outcomes have helped to abort thinking about the purposes of education and varieties of achievement that are worth honoring.
LikeLike
The goal was never to draw any MEANINGFUL conclusion from these tests, the goal, for the charter-loving crowd, has ALWAYS been how to make money off of the education of the the most vulnerable children in our society.
LikeLike
“. . . how complicated it is to draw meaningful conclusions from them.”
No, it’s not complicated at all. A very simple phrase that describes the testing process is: Crap in Crap out! There are no valid “meaningful conclusions” to be drawn, not only from the international test results, but from all standardized tests as the results are COMPLETELY INVALID to begin with, at the conceptual, foundational, and epistemological and ontonlogical basis. No one has ever rebutted nor refuted Noel Wilson’s seminal treatise “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700 and no one ever will.
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other words all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
LikeLike