James Harvey, director of the National Superintendents Roundtable, wrote a terrific article about international assessments in Valerie Strauss’s Answer Sheet.
He explains:
“These assessments were never intended to line up and rank nations against each other like baseball standings.
“That’s right. The statisticians and psychometricians who dreamed up these assessments 50 years ago stated explicitly that the question of whether “the children of country X [are] better educated that those of country Y” was “a false question” due to the innumerable social, cultural, and economic differences among nations. But, hey, that’s just a detail.”
Another point:
“2. The “international average” isn’t what you think it is. It’s not a weighted average of all the students in the world, but an average of the national averages.
“This means that when calculating the “international average,” the 5,600 students in Lichtenstein, the 700,000 in Ireland, the 860,000 in Finland, the 5 million in Canada, and the 14 million in Japan carry exactly the same weight as the 56 million students in the United States.”
And here is more:
“3. These assessments compare apples and oranges.
“Do you think there’s anything to be learned from comparing the average performance of 5,600 wealthy white students in Lichtenstein with 56 million diverse students in the United States? Really? How about comparing our students with students in corrupt dictatorships like Kazakhstan, religious monarchies like Qatar, or the wealthiest city in China (Shanghai) after it has driven the children of low-income migrants back to their home provinces? As a report released in January by the Horace Mann League and the National Superintendents Roundtable, “School Performance in Context: The Iceberg Effect,” makes clear, these are just a few of the peculiar comparisons that lie behind these international assessment results.”
“5. The horse-race tables ignore differences in poverty, inequity, and social stress among nations.
“Fifty years of research in the United States and abroad documents a powerful correlation between low student achievement and poverty and disadvantagement. Yet reports on these international assessments blandly turn a blind eye on the implications of this research. The data are clear: Poverty rates among American students are five times higher than they are in Finland. China aside, we have the highest rates of income inequality in the nine nations examined in The Iceberg Effect. The rate of violent deaths in American communities is eight times the average rate in the other eight nations and 13 times greater than it is in Japan. All of that is ignored in the orgy of publicity organized by the sponsoring agencies of these assessments to highlight their findings.”
All in all, a brilliant analysis of the limitations of these tests that have promoted the deeply flawed agenda of test and punish.

Well of course you can’t compare them like baseball teams — not all countries even have baseball teams, and even if they did, all the kids couldn’t afford the equipment! That’s why soccer — “football” to the rest of the world — is so popular all over the place: all you need is a simple rag ball and some moxy, and you’ve got a game!
How about measuring THAT for once, eh? The International Play Index or the Children’s Sports of the World Evaluation? Or are the liberals going to say that it’s unfair to compare Peru to Ghana, because it’s a different culture, different incomes, and different everything?
What IS a good measurement, then? Or should we just give up because it’s too hard?
I say make ’em all take the SAT, and if they don’t speak enough English to do it, then they need to learn, because they sure won’t be ready for an American University if they can’t even pass the SAT.
LikeLike
Diane – just where do you advertise for your trolls? Craigslist? Any chance we could get a higher caliber of troll around here? 😉
LikeLike
Just lucky I guess.
LikeLike
“What IS a good measurement, then?”
One that has a solid logical epistemological and ontological basis.
LikeLike
Dienne: IMHO, your request is not exactly “no pidas peras al olmo” [Spanish, “don’t ask for pears from an elm tree” meaning “don’t ask for the impossible”] but even at my most hopeful I think it might be described as a quest in which one dreams “the impossible dream…” [THE MAN OF LA MANCHA].
But the beat, and the quest, go on…
😏
Even the simplest data analysis backs up James Harvey.
From the late Gerald Bracey, READING EDUCATIONAL RESEARCH: HOW TO AVOID GETTING STATISTICALLY SNOOKERED (2006), the first seven of his “Principles of Data Interpretation”:
#1), “ Do the arithmetic. Corollary: Check the arithmetic.” [p. 21]
#2), “Show me the data!” [p. 23]
#3), “Look for and beware of selectivity in the data.” [p. 28]
#4), “When comparing groups, make sure the groups are comparable.” [p. 31]
#5), “Be sure the rhetoric and the numbers match. In all too many instances, they don’t.” [p. 32]
#6), “Beware of convenient claims that, whatever the calamity, public schools are to blame.” [p. 32]
#7), “Beware of simple explanations for complex phenomena.” [p. 35]
Under the last he reminds us of H. L. Mencken’s observation:
“For every complex problem there is an answer that is clear, simple, and wrong.”
Thank you for your comment.
😎
LikeLike
Wonderful post. With your recent posts ,you certainly have covered the detrimental aspects of testing, with proof by experts, and offered the story of how testing is the scourge that serves to remove teachers… those that are left, because before the testing mania, there was another process, the one that took our the nation’s veteran teachers while the unions sat back and let it happen.
http://www.speakingasateacher.com/SPEAKING_AS_A_TEACHER/No_Constitutional_Rights-_A_hidden_scandal_of_National_Proportion.html
Now, if only the media follows what you are doing, now that the NPE is up and running.
Hopefully the public will realize that the profession of pedagogy has been utterly devastated so that the oligarchs whom you call the ‘billionaire’s boys club,’ and monetarize education, and destroy democracy and the only road that offers opportunity to the people.
LikeLike
One of the details about education research that was underscored in my doctoral studies was how little can be learned from horse races in education and how inappropriate they are for children, since there is always just one winner and everyone else is labeled a loser. How ironic that the main component of legislation entitled, “No Child Left Behind” (and it’s evil spawn, “Race to the Top”) is horse racing.
“Fifty years of research in the United States and abroad documents a powerful correlation between low student achievement and poverty and disadvantagement.”
Since American teachers are scapegoated for the achievement gap between low income and higher income students virtually every single day in the US, but no one ever mentions this critical global data, it can never be emphasized enough that poor children perform worse than wealthier kids EVERYWHERE: “International Tests Show Achievement Gaps in All Countries” http://www.epi.org/blog/international-tests-achievement-gaps-gains-american-students/
LikeLike
What we are see in his a cultural trend. A cooking show is a competition where someone loses. Singing contests, building competitions, all have losers, many of whom are quite wonderful.
LikeLike
“What IS a good measurement?”
The effectiveness of marketing or propaganda.
LikeLike
I like your response better than mine!!
LikeLike
Gerald Bracey made most of these same observations/assertions in several books and articles over the past 2 decades or so. No one listened then, and I’m sure this article will be given the same treatment. Our media and policymakers like to rank and stack and pit one party against another, create the element of competition (and crisis) even when there is none. This benefits our corporate ed deformers, ever ready to take advantage in the name of profit.
LikeLike
“No one listened then, and I’m sure this article will be given the same treatment.”
Kind of like Noel Wilson’s work, eh!?!?!
Came upon both Wilson and Bracey (Glass, and many others) on the Arizona State University’s Educational Policy Analysis Archives. Any educator worth his/her salt should be reading the articles there
LikeLike
For all those interested in a “better eduction for all” I strongly recommend Gerald Bracey’s READING EDUCATIONAL RESEARCH: HOW TO AVOID GETTING STATISTICALLY SNOOKERED (2006).
With minor changes for dates and such, still very pertinent to what is happening today.
And best of all, accessible to just about everybody.
Thank you both for your comments.
😎
LikeLike
Am I imagining things? I swear I read a post about Massachusetts and PAARC testing today. Maybe it was an older post?
LikeLike
kayinmassachusetts, this is the post you read: https://dianeravitch.net/2015/01/30/why-is-massachusetts-switching-to-parcc/
LikeLike
Who felt the need to compare students around the world can’t possibly think this is scientific since the variables are so extreme.
LikeLike
The other reason it’s not fair to compare students from different countries is that they don’t all have the same curriculum. How can you compare one set of kids who have been taught something in the test compared to another group of kids who haven’t? This is especially true with science in TIMSS.
LikeLike
And the other thing is that different countries have different hours of instruction time for different subjects. In grade 4, Singapore had 201 hours of math instruction time per year, USA had 171 and Japan 136 (as reported in TIMSS 2006/07).
LikeLike
First, saying that the test is biased is fine. Differences in poverty, teenage pregnancy rates, income inequality, cultural attitudes, etc. are valid concerns. But not excuses to justify poor US outcomes. We need to recognize the (imperfect) red flag indicator and perhaps consider a solution to poverty to help our US outcomes in education, for example. Or how about improve access to health care to a include a public service program to reduce teenage pregnancy. Second to the person complaining about the unfairness of the test because students have differing curriculums. The fact is that a right triangle has certain properties, quadratic and polynomial equations render certain graphs, Newton’s Laws remain across political boundaries, and so on.
The problem is not the validity of the test. The problem is the over use of testing and lack of intelligent response to the indications of the results. Teaching is the only professional field I know of that has policy makers with no experience in the field deciding the direction of education. Note for example, the state commissioner of education in NM never taught in the classroom and many administrators are out of touch after having been removed from the classroom for long periods of time.
Policy should be determined at the local level by teachers.
LikeLike
Yes, but one country might teach pythagorous theorem in grade 4 and another country might teach it in grade 5 and do the Euclidean algorithm in grade 4.
If the testers assume that the pythagorous theorem is always taught in grade 4 and the Euclidean algorithm is never taught in grade 4 then they will only test for the pythagorous theorem and never the Euclidean algorithm and the second country will appear weaker.
One problem is the over-use of testing internal to the USA but the one other big problem is the validity of the tests between countries.
LikeLike
“The problem is not the validity of the test.”
Weeeellllll, yeeeessss the problem is with the validity. Without validity all else is mute. For a short primer on the invalidity of the educational malpractices that are educational standards and standardized testing read Noel Wilson’s “A Little Less than Valid: An Essay Review” found at:
Click to access v10n5.pdf
LikeLike
“The problem is not the validity of the test.”
That is the main problem the tests are COMPLETELY INVALID as shown by Noel Wilson in his never refuted nor rebutted “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other word all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
By Duane E. Swacker
LikeLike