This article on “The Costs of Accountability” appeared in The American Interest. It was written by Jerry Z. Muller, a professor of history at Catholic University of America in Washington, D.C. It is a long and thoughtful article, and I can offer just a few snippets. I urge you to read it. It is a five-star article that explains how much money and energy is wasted in pursuit of the Golden Fleece of “accountability.” It has become an industry unto itself.
The Google Ngram Viewer, which instantly searches through thousands of scanned books and other publications, provides a rough but telling portrait of changes in our culture. Set the parameters by years, type in a term or phrase, and up pops a graph showing the incidence of the words selected from 1800 to the present. Look up “gender”, for example, and you will see a line that curves upward around 1972; the slope becomes steeper around 1980, reaches its peak in 2000, and afterwards declines gently. Type in “accountability” and behold a line that begins to curve upward around 1965, with an increasingly steep upward slope after 1985. So too with “metrics”, whose steep increase starts around 1985. “Benchmarks” follows the same pattern, as does “performance indicators.” But unlike “gender”, the lines for “accountability”, “metrics”, “benchmarks”, and “performance indicators” are all still on the upswing.
Today, “accountability” and its kissing cousins “metrics” and “performance indicators” seem to be, if not on every lip, then on every piece of legislation, and certainly on every policy memo in the Western world. In business, government, non-profit organizations, and education, “accountability” has become a ubiquitous meme—a pattern that repeats itself endlessly, albeit with thousands of localized variations.
The characteristic feature of the culture of accountability is the aspiration to replace judgment with standardized measurement. Judgment is understood as personal, subjective, and self-interested; metrics are supposed to provide information that is hard and objective. The strategy behind the numbers is to improve institutional efficiency by offering rewards to those whose metrics are highest or whose benchmarks have been reached, and by punishing those who fall behind relative to them. Policies based on these assumptions have been on the march for decades, hugely enabled in recent years by dramatic technological advances, and as the ever-rising slope of the Ngram graphs indicate, their assumed truth goes marching on.
The attractions of accountability metrics are apparent. Yet like every culture, the culture of accountability has carved out its own unquestioned sacred space and, as with all arguments from presumed authority, possesses its characteristic blind spots. In this case, the virtues of accountability metrics have been oversold and their costs are underappreciated. It is high time to call accountability and metrics to account.
That might seem a quixotic, if not also a perverse, aspiration. What, after all, could be objectionable about accountability? Should not individuals, departments, divisions, be held to account? And how to do that without counting what they are doing in some standardized, numerical form? How can they be held to firm standards and expectations without providing specific achievement goals, that is, “benchmarks”? And how are people and institutions to be motivated unless rewards are tied to measureable performance? To those in thrall to the culture of accountability, to call its virtues into question is tantamount to championing secrecy, irresponsibility, and, worst of all, imprecision. It is to mark oneself as an enemy of democratic transparency.
To be sure, decision-making based on standardized measurement is often superior to judgment based on personal experience and expertise. Decisions based on big data are useful when the experience of any single practitioner is likely to be too limited to develop an intuitive feel for or reliable measure of efficacy. When a physician confronts the symptoms of a rare disorder, for example, she is better advised to rely on standardized criteria based on the aggregation of many cases. Data-based checklists—standardized procedures for how to proceed under routine or sometimes emergency conditions—have proven valuable in fields as varied as airline operation, rescue squad work, urban policing, and nuclear power plant safety, among a great many.
Clearly, the attempt to measure performance, however difficult it can be, is intrinsically desirable if what is actually measured is a reasonable proxy for what is intended to be measured. But that is not always the case, and between the two is where the blind spots form.
Measurement schemes are deceptively attractive because they often “prove” themselves through low-hanging fruit. They may indeed identify and help to remedy specific problems: It’s good to know which hospitals have the highest rates of infections, which airlines have the best on-time arrival records, and so on, because it can energize and improve performance. But, in many cases, the extension of standardized measurement may suffer diminished utility and even become counterproductive if sensible pragmatism gives way to metric madness. Measurement can readily become counterproductive when it tries to measure the unmeasurable and quantify the unquantifiable, whether to determine rewards or for other purposes. This tends to be the case as the scale of what is being measured grows while the activity itself becomes functionally differentiated, and when those tasked with doing the measuring are detached organizationally from the activity being measured.
He writes specifically about education:
No Child, Doctor, or Cop Left Behind
In the public sector, the show horse of accountability became “No Child Left Behind” (NCLB), an educational act signed into law with bipartisan support by George W. Bush in 2001 whose formal title was, “An act to close the achievement gap with accountability, flexibility, and choice, so that no child is left behind.”
The NCLB legislation grew out of more than a decade of heavy lobbying by business groups concerned about the quality of the workforce, civil rights groups worried about differential group achievement, and educational reformers who demanded national standards, tests, and assessment. The benefit of such measures was oversold, in terms little short of utopian.
Thus William Kolberg of the National Alliance of Business asserted that, “the establishment of a system of national standards, coupled with assessment, would ensure that every student leaves compulsory school with a demonstrated ability to read, write, compute and perform at world-class levels in general school subjects.” The first fruit of this effort, on the Federal level, was the “Improving America’s Schools Act” adopted under President Clinton in 1994. Meanwhile, in Texas, Governor George W. Bush became a champion of mandated testing and educational accountability, a stance that presaged his support for NCLB.
Under NCLB states were to test every student in grades 3–8 each year in math, reading, and science. The act was meant to bring all students to “academic proficiency” by 2014, and to ensure that each group of students (including blacks and Hispanics) within each school made “adequate yearly progress” toward proficiency each year. It imposed an escalating series of penalties and sanctions for schools in which the designated groups of students did not make adequate progress. Despite opposition from conservative Republicans antipathetic to the spread of Federal power over education, and of some liberal Democrats, the act was co-sponsored by Senator Edward Kennedy and passed both houses of Congress with majority Republican and Democratic support. Advocates of the reforms maintained that the act would create incentives for improved outcomes by aligning the behavior of teachers, students, and schools with “the performance goals of the system.”
Yet more than a decade after its implementation, the benefits of the accountability provisions of NCLB remain elusive. Its advocates grasp at any evidence of improvement on any test at any grade in any demographic group for proof of NCLB’s efficacy. But test scores for primary school students have gone up only slightly, and no more quickly than before the legislation was enacted. Its impact on the test scores of high school students has been more limited still.
The unintended consequences of NCLB’s testing-and-accountability regime are more tangible, however, and exemplify many of the characteristic pitfalls of the culture of accountability. Under NCLB, scores on standardized tests are the numerical metric by which success and failure are judged. And the stakes are high for teachers and principals, whose salaries and very jobs depend on this performance indicator. It is no wonder, then, that teachers (encouraged by their principals) divert class time toward the subjects tested—mathematics and English—and away from history, social studies, art, and music. Instruction in math and English is narrowly focused on the skills required by the test rather than broader cognitive processes: Students learn test-taking strategies rather than substantive knowledge. Much class time is devoted to practicing for tests, hardly a source of stimulation for pupils.
Even worse than the perverse incentives involved in “teaching to the test” is the technique of improving average achievement levels by reclassifying weaker students as disabled, thus removing them from the assessment pool. Then there is out-and-out cheating, as teachers alter student answers or toss out tests by students likely to be low scorers, phenomena well documented in Atlanta, Chicago, Cleveland, Houston, Dallas, and other cities. Mayors and governors have diminished the difficulty of tests, or lowered the grades required to pass the test, in order to raise the pass rate and thus demonstrate the success of their educational reforms—and get more Federal money by so doing.
Another effect of NCLB is the demoralization of teachers. Many teachers perceive the regimen created by the culture of accountability as robbing them of their autonomy, and of the ability to use their discretion and creativity in designing and implementing the curriculum. The result has been a wave of early retirements by experienced teachers, and the movement of the more creative ones away from public and toward private schools, which are not bound by NCLB.
Despite the pitfalls of NCLB, the Obama Administration doubled down on accountability and metrics in K-12 education. In 2009, it introduced “Race to the Top”, which used funds from the American Recovery and Reinvestment Act to induce states “to adopt college- and career-ready standards and assessments; build data systems that measure student growth and success; and link student achievement to teachers and administrators.” This shows what happens these days when accountability metrics do not yield the result desired: Measure more, but differently, until you get the result you want.
Metric madness is not limited to education. Some of the problems evident in NCLB pop up in fields from medicine to policing.