Bob Shepherd is an editor, author, and recently retired teacher in Florida. He worked for many years as a developer of curriculum and assessments. He posted this comment here.

Combating Standardized Testing Derangement Syndrome (STDs)

The dirty secret of the standardized testing industry is the breathtakingly low quality of the tests themselves. I worked in the educational publishing industry at very high levels for more than twenty years. I have produced materials for all the major standardized test publishers, and I know from experience that quality control processes in that industry have dropped to such low levels that the tests, these days, are typically extraordinarily sloppy and neither reliable nor valid. They typically have not been subjected to anything like the standardization procedures used, in the past, with intelligence tests, the Iowa Test of Basic Skills, and so on. The mathematics tests are marginally better than are the tests in ELA, US History, and Science, but they are not great. The tests in English Language Arts are truly appalling. A few comments:

The state and national standardized tests in ELA are invalid.

First, much of attainment in ELA consists of world knowledge–knowledge of what–the stuff of declarative memories of subject matter. What are fables and parables, how are they similar, and how do they differ? What are the similarities and differences between science fiction and fantasy? What are the parts of a metaphor? How does a metaphor work? What is American Gothic? What are its standard motifs? How is it related to European Romanticism and Gothic literature? How does it differ? Who are its practitioners? Who were Henry David Thoreau and Mary Shelley and what major work did each write and why is that work significant? What time is it at the opening of 1984? What has Billy Pilgrim become “unstuck” in? What did Milton want to justify? What is a couplet? terza rima? a sonnet? What is dactylic hexameter? What is deconstruction? What is reader response? the New Criticism? What does it mean to begin in medias res? What is a dialectical organizational scheme? a reductio ad absurdum? an archetype? a Bildungsroman? a correlative conjunction? a kenning? What’s the difference between Naturalism and Realism? Who the heck was Samuel Johnson, and why did he suggest kicking that rock? Why shouldn’t maidens go to Carterhaugh? And so on. The so-called “standards” being tested cover ALMOST NO declarative knowledge and so miss much of what constitutes attainment in this subject. Imagine a test of biology that left out almost all world knowledge (How do vertebrates differ from invertebrates? What is a pistil? A stamen? What are the functions of the Integumentary System? What are mycelia? What is a trophic level?) and covered only biology “skills” like–I don’t know–slide-staining ability–and you’ll get what I mean here. This has been a MAJOR problem with all of these summative standardized tests in ELA since their inception. They don’t assess what students know. Instead, they test, supposedly, a lot of abstract “skills”–the stuff on the Gates/Coleman Common [sic] Core [sic] bullet list, but they don’t even do that.

Second, much of attainment in ELA involves mastery of procedural knowledge–knowledge of what to do. E.g.: How do you format a Works Cited page? How do you plan the plot of a standard short story? What step-by-step procedure could you follow to do that? How do you create melody in your speaking voice? How do you revise to create sentence variety or to emphasize a particular point? What specific procedures can you carry out to accomplish these things? But the authors of these “standards” didn’t think that concretely, in terms of particular procedural knowledge. Instead, in imitation of the lowest-common-denominator-group-think state “standards” that preceded theirs, they chose to deal in vague, poorly conceived abstractions. The “standards” being tested define skills so vaguely and so generally that they cannot, as written, be sufficiently operationalized, to be VALIDLY tested. They literally CANNOT be, as in, this is an impossibility on the level of drawing a square circle. Given, for example, the extraordinarily wide variety of types of narratives (jokes, news stories, oral histories, tall tales, etc.) and the enormous number of skills that it requires to produce narratives of various kinds (writing believable dialogue, developing a conflict, characterization via action, characterization via foils, showing not telling, establishing a point of view, etc.), there can be no single prompt that tests for narrative writing ability IN GENERAL. But this is a broader problem. In general, the tests ask one or two multiple-choice questions per “standard.” But what one or two multiple-choice questions could you ask to find out if a student is able, IN GENERAL, to “make inferences from text” (the first of the many literature “standards” at each grade level in the Gates/Coleman bullet list)? Obviously, you can’t. There are three very different kinds of inference–induction, deduction, and abduction–and whole sciences devoted to problems in each, and texts vary so considerably, and types of inferences from texts do as well, that no such testing of GENERAL “inferring from texts” ability is even remotely possible. A moment’s clear, careful thought should make this OBVIOUS. So, the tests do not even validly test for what they purport to test for, and all this invalidity in testing for each “standard” doesn’t–cannot–add up to validity overall.

Third, nothing that students do on these exams even remotely resembles what real readers and writers do with real texts in the real world. Ipso facto, the tests cannot be valid tests of actual reading and writing. People read for one of two reasons—to find out what an author thinks about a subject or to have an interesting, engaging, vicarious experience. The tests, and the curricula based on them, don’t help students to do either. Imagine, for example, that you wish to respond to this post, but instead of agreeing or disagreeing with what I’ve said and explaining why, you are limited to explaining how my use of figurative language (the tests are a miasma) affected the tone and mood of my post. See what I mean? But that’s precisely the kind of thing that the writing prompts on the Common [sic] Core [sic] ELA tests do and the kind of thing that one finds, now, in ELA courseware. This whole testing enterprise has trivialized education in the English language arts and has replaced normal interaction with texts with such freakish, contorted, scholastic nonsense.

Fourth, a lot of attainment in ELA is not about explicit learning, at all, but, rather, about acquisition via automatic processes. So, for example, your knowledge (or lack thereof) of explicit models of the grammar of your native tongue has almost nothing to do with your internalized grammar of the language. But the ELA standardized tests and the “standards” on which they are based were conceived in blissful ignorance of this (and of much else that is now known about language acquisition).

Fifth, standard standardized test development procedures require that the testing instrument be validated. Such validation requires that the test maker show that results for the the test and for particular test items and test item types correlate strongly with other accepted measures of what is being tested. No such validation has been done for any of the new generation of state and national standardized ELA tests. None. And, given the vagueness of the “standards,” none could be. Where is the independent measure of proficiency on Common Core State Standard ELA.11-12.4b against which the items on the state and national measures have been validated? Answer: There is no such measure. None. So, the tests fail to meet a minimal standard for a high-stakes standardized assessment–that they have been independently validated.

The test formats are inappropriate.

The new state and national tests consist largely of objective-format items (multiple-choice and so-called evidence-based selected response items, or EBSR). On these tests, such item formats are pressed into a kind of service for which they are, generally, not appropriate. They are used to test what in EdSpeak is called “higher-order thinking.” The test questions therefore tend to be tricky and convoluted. The test makers, these days, all insist on all the multiple-choice distracters, or possible answers, being “plausible.” The student is to choose the “best” answer from among a list of plausible answers. Well, what does plausible mean? It means “reasonable.” In other words, on these tests, many reasonable answers are, BY DESIGN, wrong answers! So, the test questions end up being extraordinarily complex and confusing and tricky–impossible for kids to answer, because the “experts” who designed these tests didn’t understand the most basic stuff about creating assessments, for example, that objective question formats are generally not great for testing so-called “higher-order thinking” and are best reserved for testing straight recall. The use of these inappropriate formats, coupled with the sloppiness of the test-creation procedures, results in question after question where there is, arguably, no correct answer among the answer choices given or one or more choices that are arguably correct. Often, the question is written so badly that it is not, arguably, answerable given the actual question stem and text provided. I did an analysis of the sample released questions from a recent FSA ELA practice exam and demonstrated that such was the case for almost all the questions on the exam, so sloppily had it been prepared. But I can’t release that for fear of being sued by the scam artists who peddle these tests to people who aren’t even allowed to see them. Hey, I’ve got some great land in Flor-uh-duh. Take my word for it. Available cheap (but not available for inspection).

The tests are diagnostically and instructionally useless.

Many kinds of assessment—diagnostic assessment, formative assessment, performative assessment, some classroom summative assessment—have instructional value. They can be used to inform instruction and/or are themselves instructive. The results of the high-stakes standardized tests are not broken down in any way that is of diagnostic or instructional use. Teachers and students cannot even see the tests to find out what students got wrong on them and why. The results always come too late to be of any use, anyway. So the tests are of no diagnostic or instructional value. None. None whatsoever.

The tests have enormous opportunity costs.

I estimate that, nationwide, schools are now spending a third of the school year on state standardized tests. That time includes the actual time spent taking the tests, the time spent taking pretests and benchmark tests and other practice tests, the time spent doing test prep materials, the time spent doing exercises and activities in textbooks and online materials that have been modeled on the test questions in order to prepare kids to answer questions of those kinds, and the time spent on reporting, data analysis, data chats, proctoring, and other test housekeeping. That’s all lost instructional time.

The tests have enormous direct, incurred costs.

Typically, the US spends 1.7 billion per year under direct contracts for state standardized testing. The PARCC contract by itself was worth over a billion dollars to Pear$on in the first three years, and you have to add the cost of SBAC and the other state tests to that. No one, to my knowledge, has accurately estimated the cost of the computer upgrades that were (and continue to be) necessary for online testing of every child, but those costs vastly exceed the amount spent on the tests themselves. Then add the costs of test prep materials and staff doing proctoring and data chats and so on. Then add the costs of new curricula that have been dumbed down to be test preppy. Billions and billions and billions. This is money that could be spent on stuff that matters—on making sure that poor kids have eye exams and warm clothes and food in their bellies, on making sure that libraries are open and that schools have nurses on duty to keep kids from dying. How many dead kids is all this testing worth, given that it is, again, invalid as assessment and of no diagnostic or instructional value?

The tests dramatically distort curricula and pedagogy.

The tests drive how and what people teach and much of what is created by curriculum developers. These distortions are grave. In U.S. curriculum development today, the tail is wagging the dog. To an enormous extent, we’ve basically replaced traditional English curricula with test prep. Where before, a student might open a literature textbook and study a coherent unit on The Elements of the Short Story or on The Transcendentalists, he or she now does random exercises, modeled on the standardized test questions, in which he or she “practices” random “skills” from the Gates/Coleman bullet list on random snippets of text. There’s enormous pressure on schools to do all test prep all the time because school and student and teacher and administrator evaluations depend upon the test results. Every courseware producer in the U.S. now begins every ELA or math project by making a spreadsheet with a list of the “standards” in the first column and the place where the “standard” will be “covered” in the other columns. And since the standards are a random list of vague skills, the courseware becomes random as well. The era of coherent, well-planned curricula is gone. I won’t go into detail about this, here, but this is an ENORMOUS problem. Many of the best courseware writers and editors I know have quit in disgust at this. The testing mania has brought about devolution and trivialization of our methods and materials.

The tests are abusive and demotivating.

Our prime directive as educators should be to nurture intrinsic motivation in order to create independent, life-long learners. The tests create climates of anxiety and fear. Both science and common sense teach that extrinsic punishment and reward systems like this testing system are highly DEMOTIVATING for cognitive tasks. See this:

https://search.yahoo.com/search?fr=mcafee&type=C111US662D20151202&p=daniel+pink+drive+rsa

The summative standardized testing system is a backward extrinsic punishment and reward approach to motivation. It reminds me of the line from the alphabet in the Puritan New England Primer, the first textbook published on these shores:

F
The idle Fool
Is whip’t in school

The tests have shown no positive results; they have not improved outcomes, and they have not reduced achievement gaps.

We have been doing this standards-and-standardized-testing stuff for more than two decades now. Richard Rothstein, the education statistician, has shown that turning our nation’s schools into test prep outfits has resulted in very minor increases in overall mathematics outcomes (increases of less than 2 percent on independent tests of mathematical ability) and NO IMPROVEMENT WHATSOEVER in ELA. Simply from the Hawthorne Effect, we should have seen some improvement. Rothstein also showed that even if you accept as valid the results from international comparative tests, if you correct for the socioeconomic level of the students taking those tests, US students are NOT behind those in other advanced, industrialized nations. So, the rationale for the testing madness was false from the start. The issue is not “failing schools” and “failing teachers” but POVERTY. We have a lot of poor kids in the US, and those kids take the tests in higher numbers than elsewhere. Arguably, all the testing we’ve been doing has actually decreased outcomes, which is consistent with what we know about the demotivational effects, for cognitive tasks, of extrinsic punishment and reward systems. Years ago, I watched a seagull repeatedly striking at his own reflection in a plate glass window, until I finally drove him away to keep him from killing himself. Whatever that seagull did, the one in the reflection kept coming back for more. It’s the height of stupidity to look at a clearly failed approach and to say, “Gee, we should do a lot more of that.” But that’s just what the Gates-funded disrupters of U.S. education–those paid cheerleaders for the Common [sic] Core [sic] and testing and depersonalized education software based on the Core [sic] and the tests are asking us to do. Enough.

In state after state in which the new generation of standardized tests has been been given, we have seen enormous failure rates. In the first year, fewer than half the students at New Trier, Adlai Steven, and Hinsdale Central–the best public schools in Illinois–passed the new PARCC math tests. In New York, in the first year of PARCC, 70% of the students failed the ELA exams and 69% the math exams. In New Jersey, 55% of students in 3-8 failed the new state reading test, and 56% the new math test. The year after, Florida delayed and delayed releasing the scores for its new ELA and math exams. Then they announced that they weren’t going to release only T-scores and percentiles but were still working on setting cut scores for proficiency. LOL. Criterion-based testing, as opposed to norm-referenced testing, is supposed to set absolute standards that students must meet in order to demonstrate proficiency. I suspect that what happened that year in Florida–the reason for the resounding silence from the state–is that the scores were so low that they couldn’t set cut scores at any reasonable level without having everyone fail.

Decades of mandated federal high-stakes testing hasn’t improved outcomes and hasn’t reduced achievement gaps. NAEP results improved a tiny, tiny bit in the first years of the testing because when you teach kids the formats of test questions, their scores will improve slightly. Then, after that, NAEP results went FLAT. No improvement, whatsoever, for a decade and a half. But the testing has had results: it has trivialized ELA curricula and pedagogy and wasted enormous resources that could have been used productively elsewhere.

The test makers are not held accountable.

All students taking these tests and all teachers administering them have to sign forms stating that they will not reveal anything about the test items, and the items are no longer released, later, for public scrutiny, and so there is no check whatsoever on the test makers. They can publish any sloppy crap with complete impunity. I would love to see the tests outlawed and a national truth and reconciliation effort put into place to hold the test makers accountable, financially, for the scam they have been perpetrating.

Anyone who supports or participates in this testing is committing child abuse. Have you proctored these tests and seen the kids squirming and crying and throwing up? Have you seen them FURIOUS afterward because of the trickiness of the tests? I have.

Standardized testing is a vampire. It sucks the lifeblood from our schools. Put a stake in it.

NB: I would love to be able to post, here, analyses of the sample release questions from ELA tests by the major companies, but I can’t because I would be sued. However, it’s easy enough to show that most of the questions are so badly written that AS WRITTEN, they don’t have a single correct answer, have more than one arguably correct answer, or are unanswerable.

It’s time to make the testing companies answerable for their rapacious duplicity and for stealing from any entire generation of kids the opportunity for humane education in the English language arts.