The Washington Post has a story today reporting that the “median” charter school in D.C. outperforms the “median” public school.

This claim is based on a study that shows a difference in proficiency rates between the two sectors.

There are two ways, at least, to read this story:

One is that the leadership of the D.C. public school system is failing, that the “reforms” of the Rhee-Henderson era are a bust because the only way they can help students is to get rid of them, to push them off into the charter sector, because the leaders of the public school system have no clue about how to “reform” public schools.

The other is that changes in proficiency rates, as Matthew Di Carlo has demonstrated time and again, are meaningless (see here too). See also this post, where he showed how Mayor Bloomberg refers to proficiency rates as “meaningful,” but also “arbitrary.” They are indeed arbitrary. Read Andrew Ho’s important paper on how arbitrary proficiency rates are.

Di Carlo writes:

…… district officials and other national leaders use rate changes to “prove” that their preferred reforms are working (or are needed), while their critics argue the opposite. Similarly, entire charter school sectors are judged, up or down, by whether their raw, unadjusted rates increase or decrease.

So, what’s the problem? In short, it’s that year-to-year changes in proficiency rates are not valid evidence of school or policy effects. These measures cannot do the job we’re having them do, even on a limited basis. This really has to stop.

The literature is replete with warnings and detailed expositions of these measures’ limitations. Let’s just quickly recap the major points, with links to some relevant evidence and previous posts.

  • Proficiency rates may be a useful way to present information accessibly to parents and the public, but they can be highly-misleading measures of student performance, as they only tell you how many test-takers are above a given (often somewhat arbitrary) cutpoint. The problems are especially salient when the rates are viewed over time – rates can increase while average scores decrease (and vice-versa), and rate changes are heavily dependent on the choice of cutpoint and distribution of cohorts’ scores around it. They are really not appropriate for evaluating schools or policies, even using the best analytical approaches (for just two among dozens of examples of additional research on this topic, see this published 2008 paper and this one from 2003);
  • The data are (almost always) cross-sectional, and they mask changes in the sample of students taking the test, especially at the school- and district-level, where samples are smaller (note that this issue can apply to both rates and actual scores; for more, see this Mathematica report and this 2002 published article);
  • Most of the change in raw proficiency rates between years is transitory – i.e., it is not due to the quality of a school or the efficacy of a policy, but rather to random error, sampling variation (see the second bullet) or factors, such as students’ circumstances and characteristics, that are outside of schools’ control (see this paper analyzing Colorado data, this one on North Carolina and our quick analysis of California data).