As we saw in the previous postTom Sgouros explained in detail why it was wrong for Rhode Island to use the NECAP as a graduation requirement. It was not designed for that purpose, and many students will fail who should have passed.

State Commissioner of Education Deborah Gist said Sgouros was wrong because he is not a psychometrician. She did not explain why he was wrong, not did she understand that psychometricians would likely agree with Sgouros. The cardinal rule of testing is that tests should be used only for the purpose for which they were designed.

Here is Sgouros’ account (if I hear from Gist, I will print hers):

Gist Offers Logical Fallacies On NECAP Value

By Tom Sgouros on March 20, 2013

I was on the radio ever so briefly this afternoon, on Buddy Cianci’s show with Deborah Gist. Unfortunately, the show’s producer hadn’t actually invited me so I had no idea until it had been underway for an hour. I gather they had a lively conversation that involved belittling the concerns about the NECAP test that I expressed here.

While I was on hold, I had to get on a bus in order not to leave my daughter waiting for me in the snow. Then Buddy said the bus was too loud but he’d invite me back on. So I was only on for about five minutes, long enough to hear Gist say I may be good at math, but I’m no psychometrician.

Guilty as charged, but somewhat beside the point.

I’ve heard the commissioner speak in public in a few different ways since I published my letter last week. She tweeted about it a couple of times last week and over the weekend. She was quoted in the paper this morning about how it was an “outrageous act of irresponsibility” for adults to take the NECAP 11th grade math test at the Providence Student Union event on Saturday. And today she spent a while on the WPRO airwaves insulting me.

But I have yet to hear any of the points I’ve made taken on directly.

Only what is called the argument from authority: I’m education commissioner and you’re not. Or in this case: I’m education commissioner, and you’re not a psychometrician.

As a style of public argument, this is highly effective, especially if salted with a pinch of condescension. It typically has the effect of shutting down debate right there because after all, who are you to question authority so?

The problem is if you believe, as I do, that policy actually matters, this is a dangerous course to take.

After all, the real point of any policy discussion is not scoring debate points, but finding solutions to the problems that beset us. This is a highly imperfect world we live in, filled with awful problems, some of which we can only address collectively. If you don’t get the policy right, here’s what happens: the problems don’t get solved. Frequently, bad policy makes the problems worse, no matter how many debate points you scored, or how effectively you shut up your opponent.

So, do I care that Deborah Gist thinks I’m an inadequate excuse for a psychometrician? It turns out that, upon deep and lingering introspection, I can say with confidence that I do not. But I do care about the state of math education in Rhode Island, and I believe she has us on a course that will only damage the goal she claims to share with me.

Now I may be wrong about my NECAP concerns, but nothing I’ve learned in the past week has made me less confident in my assessment. On the one hand, I’ve seen vigorous denunciations of the PSU efforts, and mine, none of which have actually addressed the points I’ve raised. These are specific points, easily addressed. On the flip side, I’ve quietly heard from current and former RIDE employees that my concerns are theirs, but the policy is or was not in their hands.

Those points again: there are a few different ways to design a test. You can make a test to determine whether a student has mastered a body of knowledge; you can make a test to rank students against each other; you can make a test to rank students against each other referenced to a particular body of knowledge. I imagine there are lots of other ways to think about testing, but those are the ones in wide use. The first is a subject-matter test, like the French Baccalaureate or the New York State Regents exams. The second is a norm-referenced test like the SAT or GRE, where there are no absolute scores and all students are simply graded against each other on a fairly abstract standard. NECAP is in a third category, where it ranks students, but against a more concrete standard. The Massachusetts MCAS is pretty much the same deal, though it seems to range more widely over subject matter.

The problem comes when you imagine that these are pretty much interchangeable. After all, they all have questions, they all make students sweat, and they all require a number two pencil. How different could they be?

Answer: pretty different. If your goal is ranking students, you choose questions that separate one student from another. You design the test so that the resulting distribution of test scores is wide, which is another way to say that lots of students will flunk such a test. If your goal is assessing whether students have mastered a body of knowledge, the test designer won’t care nearly so much about the resulting distribution of scores, only that the knowledge tested be representative of the field. (The teacher will care about the distribution, of course, since it’s a measure of how well the subject has been taught.) The rest was explained in my post last week.

The real question is, if you don’t know what the NECAP is measuring, why exactly might you think that it’s a good thing to rely on it so heavily as a graduation requirement?

Deborah Gist is hardly the first person to call me wrong about something. That happens all the time, as it does for anybody who writes for the public about policy. But like so many others who claim I am wrong, she refuses to say — or cannot say — why.