Reader and arts consultant Laura Chapman cites an article in today’s Wall Street Journal that reminds us that test scores are not objective.

Panels of experts and non-experts make a judgment about what is “proficient,” what is the “cut score for other labels. It is a judgment. The person in charge can adjust the cut score to make the tests harder or easier. If he wants to show that kids are really dumb, he will choose a very high cut score. If he wants to show that kids are improving under his amazing leadership, he will drop the cut score, and more kids will pass. The public is easily hoodwinked. The scores are Bunkum.

Chapman comments:

Today, the Wall Street Journal reports on the results of NY state tests (page 2, weekend).
http://online.wsj.com/articles/test-scores-are-no-sure-guide-to-what-students-know-1405122823?tesla=y

The headline is amazing: “Test Scores Are No Sure Guide to What Students Know: Results Say More About the Way Test Makers Decide to Measure Children’s Knowledge”

The graphics show performance trends in math and English, grades 3-8, before and after the new CCSS tests in math and English. Big drop in ”proficiency.”

Then the author of the article, Jo Craven McGinty, tries to explain how the new cut scores for “proficiency” are determined.

“A panel of 95 teachers divided into math and English groups (45.7 in each group?) were given “the test the students took in order of difficulty from easiest to most difficult.” Each teacher was given the task of dropping a bookmark on the test to indicate a level of performance with enough correct answers to qualify for a Level 1 “proficiency” or Level 2 (and so on). This process was repeated four times to arrive at final cut scores, meaning something like a consensus on “the threshold for each performance level.” Of course, with four iterations of the process, teachers may develop a bit of fatique, and like a hung jury may produce a 2/2 vote on the cut…No details here, but “cut” is a good name for te score.

The article is intended to convey the gist of the process, not the technicalities. For example, the teachers are not asked to determine the “order of difficulty” of the tests. That has been pre-determined, likely through statistical methods for item analysis. Teachers are setting cut scores for judgments about “good enough” or “ not good enough” for given label. The labels and cut scores function much like the old fashioned A-F rating system. As one expert said, the idea is to “send a message to kids about what is good enough.”

I think this not the primary purpose of the new testing regime. The real purpose is to reinvent the tests and scoring scale (cut scores) so fewer students appear to doing well in school and to condemn prior tests as too easy.

According to more than one expert in psychometrics, the term “proficient” is not much more than a human judgment about labeling a performance on a test.

That process is not objective, and it becomes even more complex when test items go beyond requests for fill-in-the-bubble answers. One expert is quoted in the article: “People believe they know what these labels mean. It has nothing to do with how well kids are doing. It is a way of making a judgment about how performance is going to be labeled.”

And this is the protocol that makes test scores “objective measures.” Give me a break.