Simon Greenhalgh is assistant principal at the Seoul Foreign School. Most of his professional career has been spent in education in Asia.
In this article, he warns about the hidden costs to students in East Asia of an educational regime that values high test scores above all else that a student might do or accomplish.
The recent TIMSS (Trends in International Mathematics and Science Survey) and PISA (Program for International Student Assessment) results have been met with the usual fanfare or damp squib depending on where you reside. The idea to compare countries’ educational systems via standardized tests always seems to be an ill-considered one, and yet every year such rankings keep coming. There are people eagerly awaiting the results to see whether a country did well enough to provide cause for celebration or poorly enough to allow for an onslaught of criticism.
TIMSS and PISA have many critics, not least from leading academics like Yong Zhao at the University of Kansas; but despite their criticisms they are regularly held up by governments to champion or condemn educational practices in their own country – or other countries. Among other things this has led to an unhealthy obsession among educators from all over the world with Finland, whose results, system, and general educational indefatigability has become the hopeful example of a Western nation that can achieve good scores in the face of competition from Asia, without the need for mirroring long hours and encouraging high pressure competition.
This year, in a pattern that is becoming monotonous to follow, the Asia-Pacific has all the real winners, with the TIMSS results showing that Singapore, Hong Kong, South Korea, Chinese Taipei (Taiwan), and Japan significantly outperformed all other countries. These locations have used their success in these global standardized tests to boast of their educational systems, schools, teachers, and students. On the back of this success, we have witnessed a gentle wooing of the West, where, for example, the U.K. now widely utilizes a Singapore mathematics scheme. Alongside these academic initiatives, we have also been subjected to the unusual spectacle of a number of bizarre and disingenuous exchanges, mostly in the name of entertainment. We have seen, on the BBC no less, British children sent to Chinese schools (it was too hard, the Brits were too rude, and they gave up) and Chinese teachers sent to U.K. schools (the children were too rude and wouldn’t listen, so the teachers gave up).
In a recent follow-up, the BBC covered three Welsh children in South Korean schools and found that the truth is far more complicated than anything a standardized test can show. South Korean children are studying 14 to 16 hours a day to compete, ultimately, in a make-or-break university entrance exam. Welsh children don’t need to do this and they have more free time, which, depending on who you ask, may be a good or a bad thing.
Welsh PISA scores are lower than South Korean ones, but comparing the two is like comparing leeks and kimchi. The Asia-Pacific countries near the top of PISA rankings all feature tough final university entrance exams in highly competitive social environments, whereas Western countries left this behind a long time ago. A key moment in the story of the Welsh children in South Korea came when one of the South Korean children talks about suicides from her peer group due to the pressure coming from family and teachers over the college entrance exam results. How many suicides are worth it so that a country can attain the highest PISA scores?
The BBC, keen to make us laugh at hapless Welsh teenagers trying to mop the floors in a Seoul high school, also makes the more morbid point that South Korea has the highest suicide rate in the industrialized world, plausibly linked to the competitive nature of the society. However, those familiar with the countries that topped the TIMSS and PISA rankings will not be surprised by the suicides, the competitiveness, nor the 14-16 hour days of the students. This is the norm in these countries. This is not the norm in Wales and other Western countries for a reason: a different culture exists….
The real story is that whilst the wealthier Asian countries dominate these tables, the students are paying a price to do so. The long-term implications at an individual level aren’t really known. The stories of children studying until midnight every night and putting in a double shift all weekend are forgotten in the league tables. What does this mean? That the educational systems are not as good or simply that they have a different focus? At the end of the day, comparing apples and orange will never produce a meaningful or fair comparison. Those that want oranges will never agree that apples are better and the apple lovers shouldn’t mimicking orange growers in the hope of getting better apples.

Unfortunately Diane the problems you highlight are not the fundamental problems. OECD Pisa, TIMSS and PIRLS treat results of tests as if they are leaves fallen in the garden in Fall. Once again I refer you to the work of Dr Hugh Morrison in Belfast, Northern Ireland. His background includes 23 years of teaching in schools and universities in addition to a quantum physics career. This is his latest critique of Item Response Theory
PISA cannot be rescued by switching IRT model because all IRT modelling is flawed.
Dr Hugh Morrison (The Queen’s University of Belfast [retired])
drhmorrison@gmail.com
On page 33 of the Times Educational Supplement of Friday 25th November 2016, Andreas Schleicher, who oversees PISA, appears to accept my analysis of the shortcomings of the Rasch model which plays a central role in PISA’s league table. The Rasch model is a “one parameter” Item Response Theory (IRT) model, and Schleicher argues that PISA’s conceptual difficulties can be resolved by abandoning the Rasch model for a two or three parameter model. However, my criticisms apply to all IRT models, irrespective of the number of parameters. In this essay I will set out the reasoning behind this claim.
One can find the source of IRT’s difficulty in Niels Bohr’s 1949 paper entitled Discussion with Einstein on Epistemological Problems in Atomic Physics. Few scientists have made a greater contribution to the study of measurement than the Nobel Laureate and founding father of quantum theory, Niels Bohr. Given Bohr’s preoccupation what the scientist can say about aspects of reality that are not visible (electrons, photons, and so on), one can understand his constant references to measurement in psychology. “Ability” cannot be seen directly; rather, like the microentities that manifest as tracks in particle accelerators, ability manifests in the examinee’s responses to test items. IRT is concerned with “measuring” something which the measurer cannot experience directly, namely, the ability of the examinee.
IRT relies on a simple inner/outer picture for its models to function. In IRT the inner (a realm of timeless, unobserved latent variables, or abilities) is treated as independent of the outer (here examinees write or speak responses at moments in time). This is often referred to as a “reservoir” model in which timeless abilities are treated as the source of the responses given at specific moments in time.
As early as 1929 Bohr rejected this simplistic thinking in strikingly general terms: “Strictly speaking, the conscious analysis of any concept stands in a relation of exclusion to its immediate application. The necessity of taking recourse to a complementary … mode of description is perhaps most familiar to us from psychological problems.” Now what did Bohr mean by these words? Consider, for example, the concept “quadratic.” It is tempting to adopt a reservoir approach and trace a pupil’s ability to apply that concept in accord with established mathematical practice to his or her having the formula in mind. The guidance offered by the formula in mind (Bohr’s reference to “conscious analysis”) accounts for the successful “application,” for example, to the solution of specific items on an algebra test.
However, this temptingly simplistic model in which the formula is in the unobserved mental realm and written or spoken applications of the concept “quadratic” take place in the observed realm, contains a fundamental flaw; the two realms cannot be meaningfully connect. The “inner” formula (in one realm) gets its guidance properties from human practices (in the other realm). A formula as a thing-in-itself cannot guide; one has to be trained in the established practice of using the formula before it has guidance properties. In school mathematics examinations around the world, pupils are routinely issued with a page of formulae relevant to the examination. Alas, it is the experience of mathematics teachers everywhere that simply having access to the formula as a thing-in-itself offers little or no guidance to the inadequately trained pupil. The formula located in one realm cannot connect with the applications in the other.
Wittgenstein teaches that no formula, rule, principle, etc. in itself can ever determine a course of action. The timeless mathematical formula in isolation cannot generate all the complexities of a practice (something which evolves in time); rather, as Michael Oakeshott puts it, a formula is a mere “abridgement” of the practice – the practice is primary, with the formula, rule, precept etc. deriving its “life” from the practice.
Returning to Bohr’s writing, it is instructive to explain his use of the word “complementarity” in respect of psychology and to explain the meaning of the words: “stands in a relation of exclusion.” Complementarity was the most important concept Bohr bequeathed to physics. It involves a combination of two mutually exclusive facets. In order to see its relevance to the validity of IRT modelling, let’s return to the two distinct realms.
We think of the answers to a quadratic equation as being right or wrong (a typical school-level quadratic equation has two distinct answers). In the realm of application this is indeed the case. When the examinee is measured, his or her response is pronounced right or wrong dependent upon its relation to established mathematical practice. However, in the unobserved realm, populated by rules, formulae and precepts (as things-in-themselves), any answer to a quadratic equation is simultaneously right and wrong!
A formula as a thing-in-itself cannot separate what accords with it from what conflicts with it, because there will always exist an interpretation of the formula for which a particular answer is correct, and another interpretation for which the same answer can be shown to conflict with the formula. Divorced from human practices, the distinction between right and wrong collapses. (This is a direct consequence of Wittgenstein celebrated “private language” argument.) This explains Bohr’s reference to a “relation of exclusion.” In simplistic terms, the unobserved realm, in which answers are compared with the formula for solving quadratics, responses are right-and-wrong, while in the observed realm, where answers are compared with the established practice, responses are right-or-wrong.
On this reading, ability has two mutually exclusive facets which cannot meaningfully be separated. The distinguished Wittgenstein scholar, Peter Hacker, captures this situation as follows: “grasping an explanation of meaning and knowing how to use the word explained are not two independent abilities but two facets of one and the same ability.” Ability, construed according to Bohr’s complementarity, is indefinite when unobserved and definite when observed. Moreover, this definite measure is not an intrinsic property of the examinee, but a property of the examinee’s interaction with the measuring tool.
Measurement of ability is not a matter of passively checking up on what already exists – a central tenet of IRT. Bohr teaches that the measurer effects a radical change from indefinite to definite. Pace IRT, measurers, in effect, participate in what is measured. No item response model can accommodate the “jump” from indefinite to definite occasioned by the measurement process. All IRT models mistakenly treat unmeasured ability as identical to measured ability. What scientific evidence could possibly be adduced in support of that claim? No IRT model can represent ability’s two facets because all IRT models report ability as a single real number, construed as an intrinsic property of the measured individual.
LikeLike
Paceni,
The problem that you outline is based upon another onto-epistemological falsehood: That we can “measure the unobservable, i.e, latent traits/abilities”.
Richard Phelps, a staunch standardized test proponent (he has written at least two books defending the standardized testing malpractices) in the introduction to “Correcting Fallacies About Educational and Psychological Testing” unwittingly lets the cat out of the bag with this statement:
“Physical tests, such as those conducted by engineers, can be standardized, of course [why of course of course], but in this volume, we focus on the measurement of latent (i.e., nonobservable) mental, and not physical, traits.” [my addition]
Notice how he is trying to assert by proximity that educational standardized testing and the testing done by engineers are basically the same, in other words a “truly scientific endeavor”.
Now since there is no agreement on a standard unit of learning and there is no measuring device calibrated against said non-existent standard unit, how is it possible to “measure the nonobservable”?
Hint: It’s not!
So much harm to so many students is caused by the educational malpractices that are standards and testing or as Phelps contends in “measuring the nonobservable”. Those malpractices are truly evil the many subtle harms they have caused. Practices based on falsehood and error can only be described as malpractices.
LikeLiked by 1 person
Isn’t Singapore a more or less authoritarian country? It allows for corporal punishment of male students. From wikipedia: In a milder form, caning is used to punish male students in primary and secondary schools for serious misbehavior.
https://en.wikipedia.org/wiki/Caning_in_Singapore
LikeLike
“How many suicides are worth it so that a country can attain the highest PISA scores? ”
They are not doing it for the PISA score . The question is as you have pointed out. In the end will it yield any benefit to the overwhelming majority of students in these countries and will it make these countries more successful economically.?
Is the Asian economic miracle built on an educated work force or on cheap labor,both menial and educated. Will global capital leave them in the dust bin of history as wages rise and jobs move elsewhere or to machines. I may argue that Robots were not responsible for the collapse of manufacturing here and the hollowing out of our economy since the late 90s, but it is hard to argue that a new round of retooling that eliminates labor in Blue and White Collar professions is now about to occur.
So expect more suicides in Korea as more and more people compete for fewer and fewer high quality jobs against a globalized economy.
Again: As Krugman said in “Sympathy for the Luddites ”
AS pissed as Ive been at Krugman since the hatchet job on Bernie, his article today is a must read. For if education is not the solution for inequality then for all so many reasons we have taken a great leap back,that must be fought and never normalized.
LikeLike
Joel,
Exactly: “How many suicides are worth it so that a country can attain the highest PISA scores? ”
LikeLiked by 1 person
As someone that has studied lots of cultural anthropology and taught foreign students for a long time, I know that attitudes toward education stem from cultural norms. Within the cultural norm there is the family attitude towards education, and after this, there is the individual with his own set of abilities, interests and level of motivation. All of these factors play a part in determining the role education plays. There are also societal factors such as poverty which has a huge role across cultures. Poor people have less opportunity and exposure that help them access education, although there are a few that may be able to achieve at high levels despite poverty. Public education has attempted to be the great equalizer by providing students with the tools to pursue their dreams. In fact, public education and public libraries are the two single most democratizing institutions in our country. It would be foolish to diminish their capacities due the the belief in a market based solution, which frankly offers no grand solutions. We must continue to defend the rights of all students to attend an authentic free, democratic public school just like many of us, our parents, and perhaps grandparents did.
LikeLike
Having graduated with a degree in anthropology those ethos like valuing education are determined by the economic circumstances in any culture. One of the greatest arguments in the social sciences has been around ,do ethos /world views determine economic outcomes or do economic out comes determine those ethos. I think the vast
majority of social scientists go with second, which is a Marxian analysis .
Ignorant politicians and Right Wing Think Tanks come on the side of poor values causing economic outcomes. It legitimates their thievery from the lower classes. It goes along with a myth of meritocracy that personalizes failure as well as justifying inequality. . Is it the lazy welfare queen or the drug use, that causes poverty and poor educational outcomes .
No its the poverty and lack of employment that cause a destructive value systems as we now see life expectancy falling among working class whites submitted to economic stress. .
Those societies in East Asia have economic reasons for their reverence for education. .In China it was a civil service system established in the dynastic era.that required education but was a road to prosperity. , For the Jews it was a survival mechanism when land ownership (among other things ) was denied.or displacement accrued … …. ….
LikeLike
Agreed. The most generalized statement I can make about students from various cultures and academic achievement is the following: The students that seem to achieve more come from stable homes, and they have a clear picture of themselves achieving. They seek challenge, and are willing to apply themselves to meet their goals.
LikeLike
By the way most marginalized people in societies around the world have large numbers of people that engage in self destructive behavior. In our country alone our native people and our poor are at risk for substance abuse. We see the same patterns with indigenous people in other countries like Australia, and New Zealand.
LikeLike
I have never seen nor expect to see a written test that can evaluate any person’s worth to society, nor to him/her self for that matter
but
As we now know.
Truth is passe. It is not relevant. Post-truth triumphed.
LikeLike
Can’t agree with “Truth is passe. It is not relevant. Post-truth triumphed.”
Whether something is true is the most relevant discussion of all as any conclusions drawn from falsehood and error can only lead to more falsehood and error causing harms from policies/practices implemented to satisfy those errors and falsehoods.
Those who point out the errors and falsehoods and concomitant are labelled pessimists, cynics, and/or hopeless Quixotic fools and told to “shut up” by friend and foe, ally and enemy alike. Sometimes it seems most prefer to live in their own error plagued existence not bothering to take the time to truly examine their own fantasy world as it is delivered to them by current popular media.
LikeLike
Thank you Senõr Swacker for your post on:
“Duane Swacker December 12, 2016 at 6:28 pm”
IT IS MY SATISFACTION TO READ that:
[start paragraph]
Richard Phelps, a staunch standardized test proponent (he has written at least two books defending the standardized testing malpractices) in the introduction to “Correcting Fallacies About Educational and Psychological Testing” unwittingly lets the cat out of the bag with this statement:
“Physical tests, such as those conducted by engineers, can be standardized, of course [why of course of course], but in this volume, we focus on the measurement of latent (i.e., nonobservable) mental, and not physical, traits.” [my addition]
[end paragraph]
However, according to my mother’s teaching and my own experience, there is a frame or a principle in all aspects in life from geology to ecology to biology and then to psychology. There is THE SAME truth in every principle of every specific aspect, or studying field. For instance, time and perseverance are the PHYSICAL major factor to build success and strength (diamond or art in body build), whereas mentality, focus emotion, talent, humanitarian and compassion are the SPIRITUAL major factor to build happiness and contentment (Buddha and all Zen Masters). Finally, the ugly TRUTH in all bottomless GREED is that being foxy, malicious, manipulative, bullying, fraudulent and savage will definitely lead to success and strength in animal kingdom and in moral-less society of human beings (China, Russia, North Korea, and USA of 2016 and soon 2017).
Here is the TRUTH from Saint Mahatma Gandhi’s quote
[start quote]
Strength does not come from winning. Your struggles develop your strengths when you go through hardships and decide NOT to surrender, that is strength.
[end quote]
In short, Dr. Ravitch and all conscientious veteran educators in this website, in NPE organization, and in all other TRUE civic organizations of all 50 states in America will unite to sustain American Public Education in a whole child education concept WITHOUT SURRENDER, which is STRENGTH.
Thank you Senõr Swacker for sharing your knowledge.
Respectfully yours,
May
LikeLike
I’m trying to get this straight … but the logic scares the hell out of me.
Here’s the headline … “Average U.S. Math Literacy PISA Scores Drop 18 Points Since 2009”
Stop right there!
So, these very expensive and very disrupting Common Core reforms … which were designed to cure poor PISA performances … have resulted in extra poor PISA performances?
Oh, man! This is not good.
“The U.S. average score in mathematics literacy in 2015 was 12 score points lower than …in 2012 and 18 score points lower than the average in 2009, but was not measurably different than the average mathematics literacy scores in 2003 and 2006.”
You mean the cure was worse than the complaint?!?!? That’s a pretty crappy reform, right?
Yes. Yes, it is.
Didn’t these Common Core zealots use these very same PISA scores to indict the pubic schools as outright failures?
Yes. Yes, they did.
This is very rich stuff. It’s hard not to smile … I mean … smirk.
This reform has the stinky whiff of malpractice. You know, that sinking feeling when supposed experts prove that they don’t know what the hell they’re doing. So they botch stuff. Badly.
Here’s something we might consider.
Maybe it’s not really a reform at all.
Maybe it’s a colossal scheme unloaded on us by a few self-anointed know-it-alls who don’t seem to know much of anything. Especially about education.
Maybe … maybe we’ve been had.
I think some very bright people have mentioned that. For several years. In several ways. To several important people. To no avail.
Denis Ian
http://truthinamericaneducation.com/common-core-state-standards/average-u-s-math-literacy-pisa-scores-drop-18-points-since-2009/#comment-2534
LikeLike
Denis Ian – well done, you are on the right track. Unfortunately those in an influential position to highlight OECD Pisa’s generation of meaningless numbers (as if they measure something) constantly avoid killing off the project. The http://www.paceni.wordpress.com blog provides all the academic background to taje down IECD Pisa and all IRT models. ETS have been engaged contractually to rescue Pisa – truly a Mission Impossible.
Do what you can to spread the news. The OECD have done enough to damage US/UK education reputations via an unaccountable bureaucrat.
LikeLike
Denis
On a positive note , I may have missed it because I pay for my Newsday delivery, only so the wife can rip out the adds. I have not seen the usual headlines across the front page blasting the poor performance nor the Editorials lamenting teachers or the opt out movement here on long Island. Perhaps as they start to own it they are not so anxious to publicize failure.
Or perhaps i am not reading ” fake news ” anymore.
LikeLiked by 1 person