Richard Rothstein and Martin Carnoy, both highly accomplished scholars, have reanalyzed the international test score data and arrived at some startling and important findings.

Their study is titled “International Tests Show Achievement Gaps in All Countries, with Big Gains for U.S. Disadvantaged Students.” It includes not only their major analysis of international test scores, but critiques by the leaders of OECD and PISA, and their response to the critiques.

This important study should change the way international tests are reported by the media, if they take the time to read Rothstein and Carnoy.

In every nation, students from the most affluent homes are at the top of the test scores, and students from the poorest homes are at the bottom. In other words, there is an “achievement gap” based on social class in every nation.

They point out that the big assessment programs—PISA and TIMSS—do not consistently disaggregate by social class, which creates “findings” that are misleading and inaccurate.

Rothstein and Carnoy note that American policymakers have been disaggregating by income and other measures since No Child Left Behind was passed, yet they gullibly accept international test score data without insisting on the same kind of disaggregation.

In other words, we know that a school where most of the students live in affluent, college-educated families will get higher test scores than a school in an impoverished neighborhood. But we don’t ask the same questions when we look at international testing data.

Rothstein and Carnoy diligently asked those questions and reached some very interesting conclusions.

*”The share of disadvantaged students in the U.S. sample was larger than their share in any of the other countries we studied. Because test scores in every country are characterized by a social class gradient—students higher in the social class scale have better average achievement than students in the next lower class—U.S. student scores are lower on average simply because of our relatively disadvantaged social class composition.” In other words, we have more poverty than other nations with which we compare ourselves, and thus lower scores on average.

*They discovered that “the achievement gap between disadvantaged and advantaged children is actually smaller in the United States than it is in similar countries. The achievement gap in the United States is larger than it is in the very highest scoring countries, but even then, many of the differences are small.”

*The achievement of “the most disadvantaged U.S. adolescents has been increasing rapidly, while the achievement of similarly disadvantaged adolescents in some countries that are typically held up as examples for the U.S.—Finland for example—has been falling just as rapidly.” (I asked Rothstein whether the gains were attributable to NCLB, and he replied that the gains for the most disadvantaged students were even larger prior to NCLB.)

*The U.S. scores on PISA 2009 that so alarmed Secretary Duncan were caused by a sampling error. “PISA over-sampled low-income U.S. students who attended schools with very high proportions of similarly disadvantaged students, artificially lowering the apparent U.S. score. While 40 percent of the PISA sample was drawn from schools hwere half or more of students were eligible for free and reduced-price lunch, only 23 percent of students nationwide attend such schools.”

*If the PISA scores are adjusted correctly to reflect the actual proportion of students in poverty, the average scores of U.S. students rise significantly. Instead of 14th in reading, the U.S. is fourth in reading on PISA. Instead of 25th in mathematics, the U.S. is 10th. “While there is still room for improvement, these are quite respectable showings.”

*Because of PISA’s sampling error, the conclusions expressed by politicians and pundits were “oversimplified, exaggerated, and misleading.”

Rothstein and Carnoy identify important differences and inconsistencies between PISA and TIMSS, and between these assessments and our own NAEP. Taken together, these differences should remind us of the many ways in which the assessments confuse policy and policymakers, the media and the public.

As they note in their conclusion, “it is not possible to say whether the results of any particular international test are generalizable and can support policy conclusions.”

They conclude: “We are most certain of this: To make judgments only on the basis of national average scores, on only one test, at only one point in time, without comparing trends on different tests that purport to measure the same thing, and without disaggregation by social class groups, is the worst possible choice. But, unfortunately this is how most policymakers and analysts approach the field.”