Statisticians Mark Palko and Andrew Gelman explain why a relentless obsession with test scores ruins the value of the scores. As their prime example, they refer to Eva Moskowitz’s Success Academies, where children and teachers live for higher scores. Not only are the children’s names and ranking posted, so are the teachers’.
You remember Campbell’s Law? That’s the axiom that says when you attach consequences to a measure, the measure loses its validity.
They write:
“When a school uses selection and attrition policies that effectively filter out many of the extremely poor, students speaking English as a second language, and the learning disabled, that clearly calls into question test score advantages that such a school might have over an ordinary public school.
“But the problems run even deeper than most critics realize: A look at the data combined with some basic principles of social science suggests that the practices of no-excuses charters are undermining the very foundation of data-based education reform.
“As statisticians with experience teaching at the high school and college level, we recognize a familiar problem: A test that overshadows the ultimate outcomes it is intended to measure turns into an invalid test.
“Back in the old Soviet Union, factories would produce masses of unusable products as a result of competition to meet unrealistic production quotas. Analogously, many charter schools, under pressure to deliver unrealistic gains in test scores, are contorting themselves to get the numbers they’ve promised. They’re being rewarded for doing so. But that monomaniacal focus on test scores undermines the correlation between test scores and academic accomplishment that originally existed.”
They note that Success Academy has astonishingly high test scores, yet for two years in a row, not a single one of their eighth grade students won admission to one of the city’s elite high schools. In the third year, some did (11% of those who took the test from SA).
In a comment on this post, Gary Rubinstein (a blogger who teaches at Stuyvesant High School, an exam school) writes:
“One thing to note, the 11% specialized HS acceptance rate–6 out of 54–is inflated since there were 200 kids who feasibly could have sat for that test but only 54 did.” Of 200 students at Success Academy who were eligible to take the test, 54 did, and 6 gained admission.
It is better to have high scores than low scores, but they should never be the measure of teacher quality or school quality. Making them too important ruins their value.

“May ruin”? Sorry, that ship sailed, the day the first standardized test was invented. There never has been any validity to standardized tests, most certainly not as a “measurement” [sic] of anything.
LikeLiked by 1 person
The problem is that these tests are still used to “prove” something. And when charter schools brag about 99% passing rates on tests and people here say “tests aren’t a valid predictor”, it doesn’t convince many people outside of this blog.
What DOES convince them is seeing how high attrition rates and eliminating low scoring students makes a school “seem” to be having success that isn’t really. Especially when so many students leaving are the at-risk students who have few other options and not the middle class students (of any race) whose parents have college degrees. And if the SUNY Charter Institute was so determined not to look closely at what happens to every child who wins a lottery spot (an already highly self-selected bunch), it would have looked closely at why ONLY one of the charter chains out of the hundreds of charter schools it oversees gets those outsize results. The fact that it pretends only Success Academy knows how to teach at-risk kids and other charter schools just take their money and do a crappy job because their administrators don’t care about those same kids is absurd. No reporter has ever called them on what — in fact — SUNY is implying when they say “no need to look closely at Success Academy because they welcome every at-risk children and work miracles”.
Success Academy’s high test scores are similar to Cancer Treatment Center of America’s high cure rate. If you eliminate the sickest patients and the lowest scoring kids, and get lots of money, you can do quite a good job with the patients who are healthiest and the at-risk kids who test well without needing much help.
But no politician is saying we should direct tons of public dollars to CTC of America (and take it directly from MD Anderson) because their doctors know how to cure cancer better than the the ones at MD Anderson. Because people understand the difference and CTC of America doesn’t have the chutzpah that Eva Moskowitz does to make claims so outrageous that it’s easy to point out how untrue they are. Except if you are a press or politician co-opted by big money and the prestige you get from not asking inconvenient questions.
I hope this article gets more coverage and the writers go on to take a very close look at the number of students who won each original Kindergarten lottery who disappear from the cohort. I suspect the size would astonish and sicken many people.
LikeLike
Watch this: (an excellent summary)
LikeLike
Tail wags dog.
Dog hits wall over and over until it is not happy but quite dead.
Eva Moskowitz on child rearing, test prep and all that.
LikeLike
It’s in almost every serious narrative ever written. Obsession over a bottom line or desired outcome is itself a fundamental flaw and it breeds corruption and downfall.
LikeLike
The “for profit” faction keeps trying to make education a commodity similar to the original assembly line developed by Henry Ford.
Children are not commodities and the assembly line approach will not work because they are human beings, not machines.
LikeLike
Yes, children are not widgets!
Nor are the teachers, admin and other folk who supervise and teach the children or supervise other human beings.
LikeLiked by 1 person
Eva Moskowitz is not a solution, she is a new form of urban blight.
LikeLiked by 1 person
She is a parasite
LikeLike
She is a microcosm of most everything that’s wrong with the United States.
LikeLike
Correct link to the article:
http://www.vox.com/2016/8/16/12482748/success-academy-schools-standardized-tests-metrics-charter
LikeLike
Folks, I’m no fan of Eva Moskowtiz, but still, readers of this blog shouldn’t be insulting her to prove a point.
Diane, your post is excellent. Statisticians Mark Palko and Andrew Gelman should be commended for making the argument so clear: nothing important is actually being measured. Since the tests aren’t even “formative”—they aren’t helping teachers or students do better—they’re essentially worthless.
Unfortunately, through the research is fascinating, I think we can agree it’s not compelling for the general public, much less for politicians, who don’t want to spend the time or effort on research-based findings.
In my opinion, the most compelling arguments against charter schools and their test prep regimen are: (1) unfairness to students, and (2) cheating. The students at Success Academies are not getting what society thinks they’re getting. They definitely are not getting an education that will serve them in college or the workplace. A recent post about lifetime earnings put the students’ disadvantage in stark terms.
The cheating argument may resonate even more with the public. Americans don’t like people who game the system. As it’s become clear that charters do this, especially SA, the public has soured on them. The combination of the NY Times articles about counseling out students, the ACLU’s lawsuits about not serving ESL and special needs students, and now this story about measurement itself being worthless, paints a very clear picture of why charter schools are “bad actors”. They are cheating us out of everything: taxpayer dollars, decent education of our youth, ownership of our local public schools.
If charter schools are to be stopped, I don’t believe the arguments should be over poverty, teachers, or standardized tests. All of those arguments just play into public perceptions of a system in need of reform. After all, that’s where the phrase “no excuses” comes from. It makes for a quick retort by charter school supporters, to the effect: stop making excuses for low-performing students, for unqualified teachers, for bad test scores.
The educational process is messy and complicated. So “no excuses” should be met with this simple retort: “no gaming of the system”.
LikeLike
Oh, it’s beyond gaming the system.
It’s even beyond systemic and even institutionalized gaming. The gaming IS the system, and it is wildly and brutally destructive.
LikeLike
You are right, D.L.
The public likes competition and rankings. They like scores and grades. Good grief, in the Olympics, some athletes “lost” by one-one-hundredth of a second!
My view is that the test scores are not what matters most. What matters most is privatizing an institution that belongs to the public.
That hurts our democracy and enriches those who take over.
LikeLike
Spot on, D.L. Gaming the system is exactly what is happening. The fact that something can be strategically attained removes the legitimacy.
LikeLike
“But that monomaniacal focus on test scores undermines the correlation between test scores and academic accomplishment that originally existed.”
and
“It is better to have high scores than low scores, but they should never be the measure of teacher quality or school quality.”
That original correlation was never valid to begin with as proven by Wilson. Since Wilson proved the invalidities involved in the standards and testing process having “high scores than low scores” is rendered COMPLETELY MEANINGLESS due to the data being TOTALLY CORRUPTED.
To understand why that is so I urge all to read and comprehend Noel Wilson’s never refuted nor rebutted 1997 dissertation “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other words all the logical errors involved in the process render any conclusions invalid.
5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
LikeLike
Thank you, Duane.
Testing needs to be put in its place.
Assessment originally meant to sit alongside.
Now testing is a racket. Education itself is being gamed by elitist ideologues.
as·sess
Origin
late Middle English: from Old French assesser, based on Latin assidere ‘sit by’ (in medieval Latin ‘levy tax’), from ad- ‘to, at’ + sedere ‘sit.’ Compare with assize.
LikeLiked by 1 person
Although I would call them “elitist idiologues*”
Idiologue (n.) One who believes in idiologies**
**Idiology (n.) a. belief in error, falsehoods and invalidities. b. the beliefs of idiots
LikeLike
Here’s the corporate reformers’ response to these arguments against the validity of standardized tests (from The74):
https://www.the74million.org/article/brown-mayor-de-blasios-shameful-claim-that-nyc-students-only-perform-well-thanks-to-test-prep?utm_content=buffer2193c&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
LikeLike
Um wait..the ®eformers’ response is that “they coulda done it too…”.
But that ignores the articles’ point that these gains are seen as a function of fiddling test conditions and not improving student achievement.
If traditional public schools were similarly successful at fiddling test conditions, by virtue of Campbell’s “law” (adage anyone?), they could well be expected to also be no better at having improved student achievement.
The point is: what these tests measure was always ever unclear. But as long as the test conditions remain constant, the outcomes have some validity as comparative measures (rankings).
But once you fiddle initial conditions, and do so variously, the outcomes are unhitched from one another.
When you fiddle the speedometer this doesn’t tell you anything about your actual speed, but the variation seen in speed does track the fiddling that was done.
LikeLike