Fred Smith is a testing expert who worked as an analyst for the New York City Board of Education for many years. In this study, he flips the question: Not, how did the students perform, but how did the tests perform?
He grades the tests and finds a remarkable increase in the number and percent of students who scored a zero, perhaps because they didn’t understand the question or provided a confused or incoherent response.
The increase in zeroes was particularly high for students with disabilities and English language learners. They were higher still for black and Hispanic students.
Smith writes:
”The data show that there has been an increase in the percentage of zero scores since the administration of exams aligned with the Common Core. We anticipate that officials will claim this outcome to be the consequence of tougher standards reflected by more rigorous exams.
“We argue that those assertions are insufficient explanations for what we found. Recall that a zero score indicates an unintelligible or incoherent answer. Certainly, some zeroes are to be expected. But the percentage of zero scores, particularly for students in grades 3 and 4, is unreasonable in our view. With so many answers deemed “incomprehensible, incoherent, or irrelevant,” we must ask whether such a program yielded any valuable information at all about our youngest students, as the testing was purported to do. The failure here is much more likely in the questions themselves and in the belief that it was acceptable to ask eight- and nine-year-olds to sit and take long exams over several days. That the data also indicate a widening achievement gaps cannot be ignored…
“Further evidence of flawed testing can be noted in the decline of zeros in 2016 — when the SED removed time limits — from the surge in 2013, for most grades. After three years of CC-aligned testing, the SED acknowledged that the time constraints imposed by the tests were an issue. This, in itself, is an after-the-fact admission that the tests were poorly developed, as test administration procedures, including timing, should be resolved as part of the test-development process before tests become operational.
“In taking stock of the testing program we must return to the fears and doubts that were expressed by a small number of people early on. Were New York State’s CC-aligned tests appropriate measures? Would they have a negative impact on students, especially the most vulnerable?
“The analyses and findings in this report vindicate these early concerns and give empirical grounding to the opt-out movement that grew to an astounding 20 percent of the test population between 2013 and 2015. Specifically, our findings raise questions about the efficacy of this kind of testing, particularly for our youngest students. They also open a needed discussion about the quality of Pearson’s work, the worth of its product, and SED’s judgment in managing the program.”
The unasked question is why we insist on testing every student in grades 3-8 every year. No other nation does it.
My guess: Congress is still inhaling the toxic fumes of NCLB, which was based on the nonexistent “Texas miracle.”

This is precisely what I argued when we rolled out the common core and began giving the creative response type questions for practice. Students with a degree of proficiency I had documented otherwise were nonetheless scoring zero on particular items in the tests we gave and scored ourselves using a rubric (a hated word, in my opinion). It was evidence, I suppose, that the students could not string together connections we saw and had spent countless hours making both in graduate school and teaching the material. We did not need the test to surmise that all but a few students would only make these connections within the context of direct practice on a particular type of question. We just knew it. Thus we were spending enormous amounts of money to discover what teachers could have reported for free. A few gifted students could put it all together, most could not. Who knew?
Teachers can generally tell why a student is not performing on tasks in class if they have time for some one on one. Especially in math, a bit of experience with a student will tell you if the problem gave trouble because of the algebra or the underlying arithmetic. If a test cannot do that better than a teacher, then it is a waste of time and money. It is much more difficult in reading and language applications. As a history teacher, it is now way more challenging to decide if a student does not understand factual information because of a reading, listening, or information deficiciency.
LikeLike
“If a test cannot do that better than a teacher, then it is a waste of time and money.” — Does this mean that a teacher doing worse than a test is money well spent?
LikeLike
“Not, how did the students perform, but how did the tests perform?”
This is the same logical strategy that Noel Wilson uses in his seminal 1997 never refuted nor rebutted “Educational Standards and the Problem of Error”. Instead of focusing on the standardized test’s supposed validity he focuses on how the test is invalid. Wilson comes up with 13 sources of invalidity-and they aren’t the only ones, any one of which renders the process invalid.
I offer this to anyone who is not familiar with this summary of the 250 page dissertation:
“Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700
Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine. (updated 6/24/13 per Wilson email)
1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.
A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).
Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.
Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”
In other words all the logical errors involved in the process render any conclusions invalid.
The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.
Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.
And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!
One final note with Wilson channeling Foucault and his concept of subjectivization:
“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”
In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.
LikeLike
“incomprehensible, incoherent, or irrelevant”
Almost perefectly descibes the tests themselves.
Make the “or” an “and” and the description would be perfect
And I see no evidence that “The Zeroes” (TM) have increased.
They have just moved around. For example, Arne Duncan moved from DOE to the Emerson Collective. And Campbell Brown moved from The Seventy Six VAMbones to Fakebook. Michelle Rhee moved from DC to California. And Betsy moved from Michigan to Washington.
LikeLike
Conservation of zeroes
LikeLiked by 1 person
“Zero Conservation”
A zero is a zero
No matter where it moves
No matter far or near, Oh
Nothing’s what it proves
LikeLike
Excellent verse, Poet!
LikeLike
Testing based on the CCSS has failed our most vulnerable students. The numbers of zeros indicates students were baffled, confused and unable to provide legitimate feedback. The testing monster has lost its mind, and education policy is dictated by politicians, not teachers. All the research on ELLs has shown that it takes at least five to seven years to achieve coordinate bilingualism. Many ELLs arrive with little to no education in their homelands. Instead of heeding research and evidence, we slap these young people with soul sucking testing that has no validity for them whatsoever. This is an abusive practice. One of the researched based tenets of teaching English to speakers of other languages is that the content must be comprehensible in order for students to learn. All the zeros demonstrate that the administration of this unsound testing was educational malpractice. Its content was incomprehensible. It makes sense that the most zeros were produced by third graders as many districts tend to receive the largest numbers of non-English speaking students in kindergarten. The ridiculous, absurd, harmful testing policies were one of the reasons I retired from teaching ESL in NYS.
LikeLike
I would just make one slight change to your most astute observations.
It’s not the testing that has failed our students.
It is the testers — all the people involved in writing these things and forcing them on children.
I think we need to stop talking about failed polices and start talking about failed policy makers. Name the names. Zero in on where the blame actually lies.
LikeLiked by 1 person
So absolutely TRUE. SomeDAM Poet.
LikeLike
YES, yes and yes. We must aggressively change the language which we use around testing. Failed Policy in place of Failed School. Bad Intervention in place of Bad Teacher. Broken Educational Department in place of Broken Public School System.
LikeLike
There is no such thing as a “failing school.” Period. There is failing leadership. Failed leadership. Failure to implement wise policies. Failure to invest adequate resources. Accountability begins at the top, not the bottom. Generals get the blame for losing battles, not privates, sergeants, and lieutenants.
LikeLiked by 1 person
“The testing monster has lost its mind,”
Nah! That monster never had a mind.
Now the people who so ardently support the complete invalidities of the standards and testing regime, well, let’s just say, their minds are either woefully inadequate or they are as Upton Sinclair states: “It is difficult to get a man to understand something, when his salary depends on his not understanding it.” In other words they are willingly bought and paid for by those who stand to make a lot of money on the educational malpractice that is the standards and testing regime.
LikeLike
For you, Pearson testing exec who doesn’t understand zeroes, I have a question and a challenge. Have you ever read ANTIGUA: The Land Of Fairies, Wizards, and Heroes? No? Perfect. (Neither have I. No one has. It’s one of the worst selling books online.) “Voltar the Dragon wants to eat everyone, and King Artor wants to prevent that. Also, fairies and wizards and heroes and stuff.” So, here’s a close reading, Common Core challenge for you: Read a short passage about a fairy from the middle of the book. Then, write an annotated explanation of why dragons are immune to lightning bolts, including a meaningful description of the author’s choice to end every sentence with an exclamation mark. Your writing will not be published or read by anyone you know or care about. Your score will not be seen for months, and will not impact your life. Now, focus. Write. See? Garbage in, garbage out.
LikeLike
“It’s one of the worst selling books online.”
Can’t be worse selling than mine.
LikeLike
Duane, very sorry, but, if you add a trending word to your title, your book will show up on hundreds of thousands of searches a day. Try ‘Hitler’. Try ‘Beyoncé’. How about ‘Infidelity to Beyoncé and Truth: Hitler and Education Malpractice in American Public Education’. It’s all in the advertising, not the content. Gates and Coleman know it. Now you do too. (Just joking a bit, authors Diane and Duane.)
LikeLike
“if you add a trending word to your title, your book will show up on hundreds of thousands of searches a day. Try ‘Hitler’.” — What a novel idea. Also, translate some benign English words like “striggle” or “work” into German, you will see definite increase in attention.
LikeLike
Exactly the reasons I wrote two books: THE WRONG DIRECTION FOR TODAY’S SCHOOLS and COMMON SENSE EDUCATION. https://rowman.com/Action/Search/_/zarra/?term=zarra
drerniezarra.com
LikeLike
Congress is still inhaling the toxic fumes of NCLB, which was based on the nonexistent “Texas miracle.”
Good line.
LikeLike
The website Deliberately Dumbing down states what is behind the common core.
LikeLike
Alignment is a concept about as rhetorically useful as the NO child left behind act and EVERY child succeeds act. if you look at the gurus of alignment studies (e.g., Andrew Porter, the late Gerald Bracey) you will find that seeking alignments between tests and standards is not possible without introducing some concepts about curriculum, and even if you do that you are caught with the impossibility of grade-to-grade alignments because the curriculum content is not the same at every grade and there is no real “interval scaling” (ruler-like) possible with content…even if that is issued to exist for the purpose of test construction.
If you have a thousands of items and responses from field testing, you can do some item analyses, but test construction is really a mental game based on the premise that there should be winners and losers, if not in a zero sum structure than from the unforgiving bell curve and decisions about cut scores.
LikeLike
I thought every child was supposed to be proficient by now thanks to the genius behind the common core curriculum.
LikeLike
Without having the questions themselves this is all is pointless exercise in statistics. And the questions will not be provided because they are copyrighted. Therefore the tests are simply to be avoided altogether. Problem solved.
LikeLike