Alfie Kohn on our obsession with metrics, in the current Education Week:
Schooling Beyond Measure
The reason that standardized-test results tend to be so uninformative and misleading is closely related to the reason that these tests are so popular in the first place. That, in turn, is connected to our attraction to—and the trouble with—grades, rubrics, and various practices commended to us as “data based.”
The common denominator? Our culture’s worshipful regard for numbers. Roger Jones, a physicist, called it “the heart of our modern idolatry … the belief that the quantitative description of things is paramount and even complete in itself.”
Quantification can be entertaining, of course. Readers love Top 10 lists, and our favorite parts of the news are those with numerical components: sports, business, and weather. There’s something comforting about the simplicity of specificity. As the educator Selma Wassermann observed, “Numbers help to relieve the frustrations of the unknown.” If those numbers are getting larger over time, we figure we must be making progress. Anything that resists being reduced to numerical terms, by contrast, seems vaguely suspicious, or at least suspiciously vague.
In calling this sensibility into question, I’m not denying that there’s a place for quantification. Rather, I’m pointing out that it doesn’t always seem to know its place. If the question is “How tall is he?,” “6 foot 2” is a more useful answer than “pretty damn tall.” But what if the question were “Is that a good city to live in?” or “How does she feel about her sister?” or “Would you rather have your child in this teacher’s classroom or that one’s?”
“To be overly enamored by numbers is to be vulnerable to their misuse.”
The habit of looking for numerical answers to just about any question can probably be traced back to overlapping academic traditions like behaviorism and scientism (the belief that all true knowledge is scientific), as well as the arrogance of economists or statisticians who think their methods can be applied to everything in life. The resulting overreliance on numbers is, ironically, based more on faith than on reason. And the results can be disturbing.
In education, the question “How do we assess kids/teachers/schools?” has morphed over the years into “How do we measure … ?” We’ve forgotten that assessment doesn’t require measurement, and, moreover, that the most valuable forms of assessment are often qualitative (say, a narrative account of a child’s progress by an observant teacher who knows the child well), rather than quantitative (a standardized-test score). Yet the former may well be brushed aside in favor of the latter by people who don’t even bother to ask what was on the test. It’s a number, so we sit up and pay attention. Over time, the more data we accumulate, the less we really know.
You’ve heard it said that tests and other measures are, like technology, merely neutral tools, and all that matters is what we do with the information. Baloney. The measure affects that which is measured. Indeed, the fact that we chose to measure in the first place carries causal weight. His speechwriters had President George W. Bush proclaim, “Measurement is the cornerstone of learning.” What they should have written was “Measurement is the cornerstone of the kind of learning that lends itself to being measured.”
One example: It’s easier to score a student writer’s proficiency with sentence structure than her proficiency at evoking excitement in a reader. Thus, the introduction of a scoring device like a rubric will likely lead to more emphasis on teaching mechanics. Either that, or the notion of “evocative” writing will be flattened into something that can be expressed as a numerical rating. Objectivity has a way of objectifying. Pretty soon the question of what our whole education system ought to be doing gives way to the question of which educational goals are easiest to measure.
—Jonathan Bouw
I’ll say it again: Quantification does have a role to play. We need to be able to count how many kids are in each class if we want to know the effects of class size. But the effects of class size on what? Will we look only at test scores, ignoring outcomes such as students’ enthusiasm about learning or their experience of the classroom as a caring community?
Too much is lost to us—or warped—as a result of our love affair with numbers. And there are other casualties as well:
• We miss the forest while counting the trees. Rigorous ratings of how well something is being done tend to distract us from asking whether that activity is sensible or ethical. Dubious cultural values and belief systems are often camouflaged by numerical precision, sometimes out to several decimal places. Stephen Jay Gould, in his book The Mismeasure of Man, provided ample evidence that meretricious findings are often produced by impressively meticulous quantifiers.
• We become obsessed with winning. An infatuation with numbers not only emerges from but also exacerbates our cultural addiction to competition. It’s easier to know how many others we’ve beaten, and by how much, if achievements have been quantified. But once they’re quantified, it’s tempting for us to spend our time comparing and ranking, trying to triumph over one another rather than cooperating.
• We deny our subjectivity. Sometimes the exclusion of what’s hard to quantify is rationalized on the grounds that it’s “merely subjective.” But subjectivity isn’t purged by relying on numbers; it’s just driven underground, yielding the appearance of objectivity. An “86” at the top of a paper is steeped in the teacher’s subjective criteria just as much as his comments about that paper. Even a score on a math quiz isn’t “objective”: It reflects the teacher’s choices about how many and what type of questions to include, how difficult they should be, how much each answer will count, and so on. Ditto for standardized tests, except the people making those choices are distant and invisible.
Subjectivity isn’t a bad thing; it’s about judgment, which is a marvelous human capacity that, in the plural, supplies the lifeblood of a democratic society. What’s bad is the use of numbers to pretend that we’ve eliminated it.
Skepticism about—and denial of—judgment in general is compounded these days by an institutionalized distrust of teachers’ judgments. Hence the tidal wave of standardized testing in the name of “accountability.” Part of the point is to bypass the teachers and indeed to evaluate them, too. The exalted status of numerical data also helps explain why teachers are increasingly being trained rather than educated.
To be overly enamored of numbers is to be vulnerable to their misuse, a timely example being the pseudoscience of “value-added modeling” of test data, debunked by experts but continuing to sucker the credulous. The trouble, however, isn’t limited to lying with statistics. None of these problems with quantification disappears when no dishonesty or incompetence is involved. Likewise, better measurements or more thoughtful criteria for rating aren’t sufficient.
At the surface, yes, we’re obliged to do something about bad tests and poorly designed rubrics and meaningless data. But what lies underneath is an irrational attachment to tests, rubrics, and data, per se, or, more precisely, our penchant for reducing to numbers what is distorted by that very act.
Alfie Kohn is the author of 12 books, including The Case Against Standardized Testing (Heinemann, 2000) and The Homework Myth (Da Capo, 2006). He lives (actually) in the Boston area and (virtually) at http://www.alfiekohn.org.

For those who believe in simplistic absolutes that sell product lines, Alfie Kohn is infuriating beyond measure. For those of us who want their assumptions challenged and look at things from different (and sometimes unexpected angles) so that we can come up with better solutions to difficult problems, he is priceless.
Consider the difference between specious data-driven educational fantasy and sound, time-tested, data-informed educational practice. I have mentioned in an earlier post the redoubtable Ms. Barbara Pene, an elementary school teacher I worked with/for as a bilingual aide. The students at the school spoke over three dozen home languages and many were recent arrivals in the USA, coming from cultural backgrounds in which such “simple” things as how children make eye contact with adults, or addressed them, or how boys treated adult women, were different. The difficulties in trying to get school staff, students and parents on the same wavelength was a 365-days-in-the-year effort. I soon discovered that Ms. Pene would get “difficult” students at all times during the school year. Why?
Turns out that even among the very experienced and tough staff (mostly women), she was the “toughest of the tough.” Reflecting back on it now, some thirty years later, I realize how very firm, but respectful and never demeaning, she managed her classroom. She volunteered — VOLUNTEERED — to take on the students who were causing havoc in other classrooms [it was not always easy to deal with the kids in her classroom, but she set a great example]. And she not only taught those kids subject matter but also how to show some respect for themselves and others. Yet in today’s testing climate, I am sure her test scores looked like an edudeformer’s best anecdote on news programs and talk shows, bouncing around in maddeningly unpredictable ways.
She would never have passed the data-[mis]driven climate favored by the proponents of privatization, standardized testing, and teacher bashing. Yet she contributed, every day, to the health [mental and emotional] of that school and to its educational progress. How do I know this? I worked with many of the other teachers in the school who explained to me the contributions she made, every day, to what they were trying to do.
How do you quantify the quality of a person like that? How do you truly explain to the bean counters that children, school staff, parents, aren’t beans that can be counted?
Maybe, just maybe, it takes a bean to be a good bean counter. And what could a bean know about being a human being?
Inquiring minds want to know!
🙂
LikeLike
When I took a position as a middle school instrumental music teacher, I was encouraged by administrative staff and colleagues alike to continue the practice of taking the students to competition every year.
Having been involved in judged auditions both as a participating student and a music educator throughout my lifetime, I was quite aware of the processes that go into the judging. In a typical public school performance competition/audition, the panel is given a set of criteria and indicators that are translated into numbers. Each judge makes a score contribution that is averaged into the entire panel score. Placement is then made based upon the scores. The judges create their own “scaling” based on the performances of the day. This might mean that even the most proficient performances might be judged with a lower score depending on where in the schedule they take place since judges always leave room for the “top” performance they may hear later on in the competition.
When a judge marks a numerical score, there is a reason behind it, and in some competitions, judges create a running narrative that is recorded at the time of the live performance so that the ensemble can hear the comments at a later date. This turns the competition into a diagnostic tool, not an absolute judgement.
If one is to take the scores of the same ensemble from year to year, one cannot wholly judge an entire program since there are far too many variables to give the scores meaning as a simple program quality assessment tool. The performance order issue is just one of the many problems with assessments based on scores in competition. This type of judging is an imperfect system in that each judging situation is unique unto itself based on the variability of environment, scheduling, participant quality, and judging opinions.
Scores, even those with comments and narratives, are always to be taken with a grain of salt. The variability is far too vast to deem them as “absolutes” in measurement. What is the quality level of a musical performance? Well, that depends. The same can be said using student test scores over time of teachers whose student populations change annually. The best use of assessments is to indicate growth, not absolutes.
LikeLike