The corporate reform assault on American public education rests in large part on the international test called PISA (Programme in International Student Assessment), where US students rank well behind other nations and have only middling performance. Of course, the critics who brandish these mediocre scores never admit that they are heavily influenced by the unusually high proportion of students living in poverty, and that American students in low-poverty schools score as well or better than the highest performing nations. To do so would be an admission that poverty matters, and they reject that idea.

But what if the PISA tests are fundamentally flawed? So argues testing experts in this article in the (UK) TES.

It turns out on examination that the results vary widely from one administration of the test to another. Students in different countries do not answer the same questions. And there are serious technical issues that experts debate.

The article asks:

“But what if there are “serious problems” with the Pisa data? What if the statistical techniques used to compile it are “utterly wrong” and based on a “profound conceptual error”? Suppose the whole idea of being able to accurately rank such diverse education systems is “meaningless”, “madness”?

“What if you learned that Pisa’s comparisons are not based on a common test, but on different students answering different questions? And what if switching these questions around leads to huge variations in the all- important Pisa rankings, with the UK finishing anywhere between 14th and 30th and Denmark between fifth and 37th? What if these rankings – that so many reputations and billions of pounds depend on, that have so much impact on students and teachers around the world – are in fact “useless”?

“This is the worrying reality of Pisa, according to several academics who are independently reaching some damning conclusions about the world’s favourite education league tables. As far as they are concerned, the emperor has no clothes.”

The article cites the concerns of many testing experts:

“Professor Svend Kreiner of the University of Copenhagen, Denmark, has looked at the reading results for 2006 in detail and notes that another 40 per cent of participating students were tested on just 14 of the 28 reading questions used in the assessment. So only approximately 10 per cent of the students who took part in Pisa were tested on all 28 reading questions.

“This in itself is ridiculous,” Kreiner tells TES. “Most people don’t know that half of the students taking part in Pisa (2006) do not respond to any reading item at all. Despite that, Pisa assigns reading scores to these children.”

“People may also be unaware that the variation in questions isn’t merely between students within the same country. There is also between-country variation.

“For example, eight of the 28 reading questions used in Pisa 2006 were deleted from the final analysis in some countries. The OECD says that this was because they were considered to be “dodgy” and “had poor psychometric properties in a particular country”. However, in other countries the data from these questions did contribute to their Pisa scores.

“In short, the test questions used vary between students and between countries participating in exactly the same Pisa assessment.”

Professor Kreiner says the methodology renders the results “meaningless.”

“The Rasch model is at the heart of some of the strongest criticisms being made of Pisa. It is also the black box within Pisa’s black box: exactly how the model works is something that few people fully understand.

“But Kreiner does. He was a student of Georg Rasch, the Danish statistician who gave his name to the model, and has personally worked with it for 40 years. “I know that model well,” Kreiner tells TES. “I know exactly what goes on there.” And that is why he is worried about Pisa.

“He says that for the Rasch model to work for Pisa, all the questions used in the study would have to function in exactly the same way – be equally difficult – in all participating countries. According to Kreiner, if the questions have “different degrees of difficulty in different countries” – if, in technical terms, there is differential item functioning (DIF) – Rasch should not be used.

“That was the first thing that I looked for, and I found extremely strong evidence of DIF,” he says. “That means that (Pisa) comparisons between countries are meaningless.”

Please, someone, anyone, send this article to Secretary Arne Duncan; to President Obama; to Bill Gates; and to all the other “reformers” who want to destroy public education based on flawed and meaningless international test scores.