Laura H. Chapman: Test Scores Are Not Objective

Reader and arts consultant Laura Chapman cites an article in today’s Wall Street Journal that reminds us that test scores are not objective.

Panels of experts and non-experts make a judgment about what is “proficient,” what is the “cut score for other labels. It is a judgment. The person in charge can adjust the cut score to make the tests harder or easier. If he wants to show that kids are really dumb, he will choose a very high cut score. If he wants to show that kids are improving under his amazing leadership, he will drop the cut score, and more kids will pass. The public is easily hoodwinked. The scores are Bunkum.

Chapman comments:

Today, the Wall Street Journal reports on the results of NY state tests (page 2, weekend).
http://online.wsj.com/articles/test-scores-are-no-sure-guide-to-what-students-know-1405122823?tesla=y

The headline is amazing: “Test Scores Are No Sure Guide to What Students Know: Results Say More About the Way Test Makers Decide to Measure Children’s Knowledge”

The graphics show performance trends in math and English, grades 3-8, before and after the new CCSS tests in math and English. Big drop in ”proficiency.”

Then the author of the article, Jo Craven McGinty, tries to explain how the new cut scores for “proficiency” are determined.

“A panel of 95 teachers divided into math and English groups (45.7 in each group?) were given “the test the students took in order of difficulty from easiest to most difficult.” Each teacher was given the task of dropping a bookmark on the test to indicate a level of performance with enough correct answers to qualify for a Level 1 “proficiency” or Level 2 (and so on). This process was repeated four times to arrive at final cut scores, meaning something like a consensus on “the threshold for each performance level.” Of course, with four iterations of the process, teachers may develop a bit of fatique, and like a hung jury may produce a 2/2 vote on the cut…No details here, but “cut” is a good name for te score.

The article is intended to convey the gist of the process, not the technicalities. For example, the teachers are not asked to determine the “order of difficulty” of the tests. That has been pre-determined, likely through statistical methods for item analysis. Teachers are setting cut scores for judgments about “good enough” or “ not good enough” for given label. The labels and cut scores function much like the old fashioned A-F rating system. As one expert said, the idea is to “send a message to kids about what is good enough.”

I think this not the primary purpose of the new testing regime. The real purpose is to reinvent the tests and scoring scale (cut scores) so fewer students appear to doing well in school and to condemn prior tests as too easy.

According to more than one expert in psychometrics, the term “proficient” is not much more than a human judgment about labeling a performance on a test.

That process is not objective, and it becomes even more complex when test items go beyond requests for fill-in-the-bubble answers. One expert is quoted in the article: “People believe they know what these labels mean. It has nothing to do with how well kids are doing. It is a way of making a judgment about how performance is going to be labeled.”

And this is the protocol that makes test scores “objective measures.” Give me a break.

john a says:

July 12, 2014 at 5:54 pm

Grading on the curve? If so, this ain’t practice is time tested. Only, in this case, it is a bit more devious and harmful.

First Grade Teacher says:

July 12, 2014 at 6:39 pm

Try this link:
http://online.wsj.com/articles/test-scores-are-no-sure-guide-to-what-students-know-1405122823

annieeducator says:

July 12, 2014 at 6:46 pm

In Florida there are a lot of questions about the scoring of the writing test. Scores are very inconsistent. Schools with 40% passing rates had 90% passing in writing in schools with very high ELL populations and schools with 95% passing in reading dropped 30 points in writing. Since the % of students ‘proficient’ in writing counts as 1/8 of the schools grade even tho only 4th graders take it, these scoring issues have impacts on school grades. The DOE says there is no problem with the scores of course. So we all have to live with the school grade we get and the repercussions even tho the scores are inaccurate. It is an election year you know.

Chris says:

July 12, 2014 at 8:32 pm

Dear President Obama and Mrs. Obama,

Please stop the junk science of high stakes testing. Think about your legacy. You campaigned on returning valid science to the White House. Test scores measure many things, not just teacher proficiency.

Duane Swacker says:

July 13, 2014 at 8:48 am

If I may correct your statement: “Test scores measure NOTHING, not EVEN teacher proficiency.”

These tests are not “measuring” devices in any truly logical usage of the term “measuring”. They are some sort of assessment device that assesses (in a piss poor fashion that is error ridden and hence INVALID and the usage of the results in any fashion UNETHICAL) something, and that something is never well defined and is quite nebulous on some supposedly agreed upon scale in which the margin of error of measurement in and of itself not to mention many other epistemological and ontological blunders renders the whole process INVALID.

larry says:

July 12, 2014 at 8:59 pm

“It has nothing to do with how well kids are doing. It is a way of making a judgment about how performance is going to be labeled”

…said the psychometrician to the judge hearing a teacher’s wrongful dismissal suit.

I’m not a lawyer, but I’d bet that if (when?) they start firing teachers based on such an arbitrary system, school districts are going to be losing a lot of very expensive law suits.

Any good lawyer would tear this stuff to shreds.

Of course, we the public are the ones who will foot the bill.

Certainly not the ones responsible

a teacher says:

July 12, 2014 at 10:02 pm

I was part of a group that set the benchmarks for the essay part of the state test. We determined what was a 3, 4, or 5, and the company who scored the essays would use our benchmarks. This was a few years ago, but it is just as relevant. Finding essays for 1’s and 2’s was easy. But the hardest part was determining what constituted a passing essay. We realized that we were determining the cutoff for the whole state. If we set the bar too high, what would happen to the scores for the schools? I know it was an important task, and I am grateful for the opportunity to have done this. But from that day on, I realized that my students’ passing scores were simply agreed upon by… well… some people who made a decision somehow.

Duane Swacker says:

July 13, 2014 at 8:51 am

“I know it was an important task. . .”

In what alternate universe can setting cut scores on a completely INVALID and UNETHICAL standardized test process be seen as being “important”? (Other than being paid to do it.)

SPEDUCATOR says:

July 12, 2014 at 10:37 pm

If this doesn’t seal the deal of ending RTTT & NCLB I don’t know what would.

rick2468 says:

July 12, 2014 at 10:38 pm

Reblogged this on McAuliffe School and commented:
I’m re-posting Diane’s summary of Laura Chapman’s blog… Is that re-re-posting? The Wall Street Journal Article is an interesting one well worth reading.

Betsy Marshall says:

July 13, 2014 at 12:05 am

The only thing that will seal the deal for ending RTTT & NCLB is a serious threat to politicians that they will lose their jobs if they continue to go along with the BIG lie.

Chris says:

July 13, 2014 at 10:32 am

Pity we don’t have a law that states that politicians that send their children to private schools such as Sidwell Friends (Obama) and Chicago Lab (Rahm Emanuel) need to stop harming public school students with the junk science of NCLB and RTTT.

Bob Shepherd says:

July 13, 2014 at 5:46 am

Much of what passes for “data-driven decision making” is, in fact, purest numerology. It’s the equivalent, in education, of astrology or phrenology.

Duane Swacker says:

July 13, 2014 at 8:51 am

or psychometrics.

Harold says:

July 13, 2014 at 7:35 am

Pseudo-science.

chemtchr says:

July 13, 2014 at 7:56 am

Watch the cut scores go up and down. This is really just an excuse to post cute kittens, and keep Diane’s cheer level up.

FLERP! says:

July 13, 2014 at 9:01 am

“According to more than one expert in psychometrics, the term “proficient” is not much more than a human judgment about labeling a performance on a test.”

What were we hoping for, a divine judgment about labeling performance? A robot’s judgment?

Ang says:

July 14, 2014 at 1:42 am

“not much more than a human judgment about labeling a performance on a test.”

Perhaps the key point is ON A TEST(as in… on one test, one one day, in one moment).

Seems to carry a lot more weight than it did when I was a student.

Bill Bradley says:

July 13, 2014 at 10:07 am

Another issue is that the teachers are not asked about the question’s validity, or the appropriateness of the answers! Some of the “harder” questions are because they have no correct answer, or more than one reasonable answer. Some are also not level or course appropriate (One of the state tests that I was “administering” for Algebra 1 contained questions that were Algebra 2 and Geometry standards, meaning that they should not have been taught in Algebra 1. When I reported this, I was told that we were “Not supposed to look at the tests!”)

July 13, 2014 at 10:25 am

We know where the bad teachers are. They’re in the area around NY and Detroit and east, west, south, and north somewhat.

..and make no mistake, we will fire them based on what we know and don’t know to be known unknowns and unknown unknowns.

Next question?

KrazyTA says:

July 13, 2014 at 10:48 pm

Regarding standardized test scores and their uses and abuses by those in mad dog pursuit of $tudent $ucce$$ [aka charterites/privatizers], you fiddle with pretested and predetermined outcomes until you achieve “alarming failure”:

[start quote]

(Notice: alarming failure, not universal failure. As education policy makers across the country have learned, there are political costs to having too many students flunk the tests, particularly if an unseemly number of them are white and relatively affluent. At that point, politically potent parents—and, eventually, even education reporters—may begin to ask inconvenient questions about the test itself. Fortunately, by tinkering with the construction of items on the exam and adjusting the cut score, it is possible to ensure virtually any outcome long before the tests are scored, or even administered. For the officials in charge, the enterprise of standardized testing is reminiscent of shooting an arrow into a wall and then drawing the target around it.)

[end quote]

(Alfie Kohn, ‘NCLB and the Effort to Privatize Public Education,” p. 82, in MANY CHILDREN LEFT BEHIND: HOW THE NO CHILD LEFT BEHIND ACT IS DAMAGING OUR CHILDREN AND OUR SCHOOLS, 2004. Other contributors include Deborah Meier, Theodore R. Sizer and Monty Neill.)

Señor Swacker, does “shooting an arrow into a wall and then drawing the target around it” qualify for a TAGO?

I leave it to your discretion.

😎

P.S. Remember, this was written ten years ago!

Laura H. Chapman: Test Scores Are Not Objective

21 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats

Laura H. Chapman: Test Scores Are Not Objective

Share this:

21 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Blog Topics

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats