While cleaning up my files, I discovered this excellent article by Alan Ehrenhalt, contributing editor to Governing magazine (and formerly executive editor for 19 years). It was written in 2013, but remains pertinent today.
Ehrenhalt sees through the fraud in the high-stakes testing obsession of our day, in which scores on standardized tests are used to label children, rate teachers, and close schools.
He begins by writing about the Tony Bennett grade-rigging scandal in Indiana, then moves on to Florida, where Jeb Bush launched measurement mania.
He writes:
The Tampa Bay Times newspaper lamented that “after grading schools for 15 years, Florida’s education leaders still cannot get it right.”
One might easily go further and argue that changing the results to make the picture look brighter, whether it involves outright cheating or not, is cause for embarrassment all by itself. If new test questions can have that much effect on a school’s overall performance grade, then why should anybody believe in the integrity of the system?
What’s especially humiliating is that Florida is the birthplace of the school testing movement, the state where former Gov. Jeb Bush decided in 1999 to begin awarding overall letter grades to individual schools to provide information for parents and help assess statewide educational performance. More than a dozen states have begun using a similar system since then, several of them just in the current year. Now they are being told that the Florida model they dutifully copied is too full of flaws to be trusted.
That matters a great deal because a lot more is riding on FCAT test scores than just local bragging rights. If a school receives repeated grades of D or F, it can be required by the state to take a variety of drastic measures, such as making the entire faculty reapply for their jobs, converting the school to a charter or closing it down altogether. So public confidence in the grading process is essential if the state is to have any credibility as a dispenser of draconian educational remedies.
States applying or adapting the Florida model have learned that changing the questions on the test, or switching to a new type of test altogether, can result in wildly fluctuating school grades. School officials in New Mexico this year were delighted to find out that the number of schools receiving A grades had more than doubled in comparison with the results from the year before. Was this the product of innovative new pedagogical techniques? Well, no. It was because the state had abandoned the federally designed No Child Left Behind test and switched to a new one designed by state education experts. Mississippi had a similar experience. Its school test scores went up dramatically because state officials took the expedient step of removing high school graduation rates from the list of test criteria for some schools.
The dramatically higher scores that resulted were a cause for initial state elation. But on further review, they raised another serious question. If the testing process is based on solid educational research, then the results from different tests ought to be reasonably congruent. If the results are dramatically disparate, there is a disturbing suggestion that the people writing the tests aren’t sure what it is they are supposed to be measuring.
Then he shifts his focus to Maine:
Maine is another state that has endured a season of controversy based on the introduction of its new school grading procedures. Gov. Paul LePage, a tireless advocate of school measurement, pushed through a new system this year based largely on the Florida model. Schools were evaluated on student test scores in reading and math; the percentage of students who had shown improvement in their scores during the past year, especially among the bottom 25 percent; graduation rates among upper-level students; and percentage of students who take the national SAT exam.
When the statewide results were tallied, Maine’s schools averaged a C grade—a reasonable enough sounding score. But when researchers in the state began looking at the results in greater detail, they found something that disturbed them. What the tests were really tracking was demographics. Schools in poorer communities around the state nearly all finished lower than their counterparts in affluent suburbs, regardless of academic methods. High schools that were graded A had an average of 9 percent of their students on free or reduced price lunch. Schools that got an F had 61 percent of their students receiving subsidized lunches. To a great extent, the test was simply a measure of poverty, not school quality.
He recognizes that testing has become a problem in itself:
It is hard not to conclude in the end that the school testing movement represents a popular fad in educational policy that is desperately lacking in either substantive methodology or common sense. Its fundamental assumption, underneath all the jargon, is that schools fail because they just aren’t trying hard enough, not because they are being asked to educate pupils who are culturally and socially unprepared to learn. Cooking the books on the tests won’t do anything to solve this problem. All it will do, when the extent of the mischief is revealed, is undermine public confidence in the entire enterprise of school testing.
We have gotten into the business of measuring school performance with precise testing numbers because it’s something we know how to measure. In doing so, we leave aside the subtler and more personal things that teachers and principals do all the time to make their schools function in an orderly way and disseminate as much learning as they possibly can. In the words of Roger Jones, a professor at Lynchburg College in Virginia, one of the states that enacted an A-F grading system this year: “We have gotten so caught up in testing that we have lost sight of a true education.”

And in the end, the kids- and later on we pay for this madness.
LikeLike
Thank you, Diane for this excellent article. So true!
LikeLike
The letter grading system based on the Florida’s errant ways is an example of the blind following the blind. All the system does is confirm what teachers and researchers have been saying all along. Poverty matters!!!
LikeLike
Or, one can go to Utah’s model–fit all grades into a bell curve. In Utah, it doesn’t matter how the schools are doing, just how they’re doing against each other. In this system, no matter how high the scores, there will always be “failing” schools. Or, if the scores are all really low, like this last year, when the stupid new CC test started and only 30% of kids “passed,” there will still be “A” schools.
The state could just save the money and rank us by socioeconomic status, because the schools with a lot of poverty, no matter how hard they work, will ALWAYS be low, because they can’t break out of the idiotic bell curve.
Leave it to Utah–take a stupid idea and adjust it to be even stupider.
LikeLike
To borrow from test-maker lingo: the whole business of school grades and VAM and SGPs and the like is one gigantic—
DISTRACTOR/DECOY/MISLEAD [i.e., what the psychometricians call the ‘not best answer[s]’ on their tests of mandated homogeneity.]
So rather than talk & debate about and consider “inputs” [using the bidness language of the self-styled “education reform” movement] like ponying up the resources to provide clean healthy classrooms and all the adult staff necessary to meet the needs of diverse student populations and all the materials and objects needed for an enriched curriculum that includes (but is not limited to) the performing & visual arts and sports and literature and such—
We are told that the “output” of test scores is all that matters, and if we stray down the corridors of rheephorm we are assaulted with every weapon in the arsenal of mathematical intimidation and obfuscation. Audrey Amrein-Beardsley calls it measure-and-punish but [with all due respect] I think she is being a little too polite—I think of it as measure-to-punish.
To sum up the whole enterprise: it’s just a way to get people to turn off their minds, shut their mouths, and submit to being labeled, sorted and ranked—with few winners and many many losers.
That’s the way I see it…
😎
LikeLike
I agree, KrazyTA. I have been emphasizing, when I have someone who is listening to me, the “experience” of school, rather than just the output. I’m working on some projects to try and turn attention to that very thing. As a mother with a kindergartener, that matters tremendously to me.
LikeLike
Absolutely spot on. Thank you.
LikeLike
There are tests that tell us what students know about some subject and there are tests that tell us what students know as compared to other students.
High stakes tests, generally, are of the second variety. Here’s why they’re dangerous to use: first of all, if the subject matter being tested isn’t very important, then the results amount to saying that some students know more about something trivial than other students; second, if the results are distributed in a standard bell-curve, then regardless of what students know, only a certain percentage of students can ever earn any particular score.
So, high stakes tests may tell us only that some students know more about something stupid than other students know, and that no matter what students, teachers or parents do, only a certain percentage of students will ever earn any particular score. Of what value would a test like this be? Only to rank students, that’s the only reason.
This danger in high stakes testing is especially disturbing in a society that has declared itself impotent with respect to creating meaningful work; since there is going to be only a small number of jobs worth doing, we need a ranking of kids to determine which ones will get these jobs. It doesn’t really matter whether any of them knows anything of value. The only thing that matters is that we know who the top 10% are.
Most people believe tests tell us what kids know. Wrong; the tests we’re talking about tell us only what kids know compared what other kids know, even if what they all know isn’t worth the paper the test is printed on. In a stupid population, you’ll still have someone who knows more than someone else.
As a country becomes stupider and stupider, one will still have kids who seem to be highly educated because some of them will know more worthless knowledge than other kids.
Ironically, in a society like this one, doing poorly on such a test could well mean that certain kids haven’t wasted their time on worthless endeavors, and thus might actually learn more of what really matters than the kids who do well on such tests!
LikeLike
“We have gotten into the business of measuring school performance with precise testing numbers because it’s something we know how to measure.” This reminds me of the utterly stupid rationale for the US bombing program in S. Vietnam in the mid-1960s. To cite an article using the work (I think) of John McDermott:
“This bombing program was created because it was the most rational form in this scenario. It uses a complex intelligence system that gathers all kinds of data of all kinds of reliability. This information is fed into a computer which then models the best and most probable targets to hit. To accomplish this procedure, incredible amounts of labor and time were exerted.
“This system seems as though it will not work that well. Much poor information is evaluated and therefore mishandled, causing many poor targets to be devastatingly bombed. Many small children and women were killed. Is this what the American people want.” [http://nutball.com/webworld/classes/gilligan/Alistair/Vietnam.htm]
In other words, technology seemed to offer a “fix” (mainly for avoiding US casualties), even if the results of the bombing were utterly counterproductive, militarily and socially. So with high stakes testing: plug into someone’s computer model garbage arrived at with “incredible amounts of labor and time” and, voila, you get a nonsensical measure of school and teacher performance that, in reality, mainly devastates the lives of kids–but also keeps those running the computers happily in business.
LikeLike
I have said this before. Scores on tests are the weapons of choice in dismantling public education. That is one reason why the charter, choice, and on-line industries are so eager to continue high stakes testing for public schools with impossible targets like 100% proficiency or else.
LikeLike
“When I look up, I see people cashing in. I don’t see heaven or saints or angels. I see people cashing in on every decent impulse and every human tragedy.”
Joseph Heller, Catch-22
LikeLike
This is a quote from the August 8, 2015 issue of The Times of NW Indiana. The headline is, “Scoring glitch delays ISTEP results”.
“The delay is due to a glitch associated with the new test required to be used this year, after the state school board ditched Common Core and adopted new Indiana-only academic standards, which state lawmakers required be “the highest standards in the United States….The test score delay likely also will prevent the state school board from assigning A-F school grades by the end of 2015, as required by Indiana law. The board typically issues school grades two to three months after ISTEP results are released.
Those grades are used as a component in teacher evaluations; determine in part teacher pay and bonuses; and are a key factor in whether low-performing schools get taken over by the state or shut down.”
……………..
I’d like to know how this garbage helps teachers or students.
LikeLike
Opps. I meant to type in Aug. 6, 2015 as the date. carolmalaysia
LikeLike
This would be very useful if the American people were persuadable with facts and logic. Sadly, we don’t live in such a country. We need another plan or different tactics.
LikeLike
This is a very good article. Anti-ed-reform articles I’ve read often take a scattershot approach to accountability methods, delving briefly into several of the many reasoned arguments against spending much time/$ on them. Readers, I think, tend to get lost punching holes in one point or another based on pro-accountability talking points.
This one hones in on one very common-sense point: “Ihe results from different tests ought to be reasonably congruent. If the results are dramatically disparate, there is a disturbing suggestion that the people writing the tests aren’t sure what it is they are supposed to be measuring.”
The author further highlights the phenomenon of widely disparate ‘school-grades’ from one year to the next, consequent to a single change (new test questions; a different model of test; adding or deleting or changing the weight of one of the school-grade-formula factors such as growth percentiles or graduation rate). The point– which seems to escape the readers of test-promoting talking points– did school curriculum/ staff/ pedagogical methods demonstrably change from last year to this year? No. So why don’t growth percentiles, graduation rates, and standardized test results reflect each other? And why would they change significantly between last year and this year, if they reflect the way the school operates?
The obvious answer is that, like VAM, these are just myriad data-points which do not reflect a school’s quality of ed-delivery, but can be massaged to produce the desired result. As James Powell says above, “Americans” are not convinced with facts and logic. However there are many thinking folks among Americans who may listen if we tie evidence-based results to taxpayer expenditures.
LikeLike
The same is happening in Delaware where our Secretary of Education, Mark Murphy, is one of Jeb Bush’s groupies. And our governor is definitely in some body’s pocket.
LikeLike