Writing Expert: Computer Scoring by PARCC-Pearson Tests Is Fundamentally Flawed

Les Perelman, who was in charge of MIT’sWriting Across the Curriculum program, wrote this opinion piece for the Boston Globe.

Perelman said that student essays written for the PARCC test, created by Pearson, would be scored by computers. Unfortunately, the computer scorers are unable to detect the meaning of language. Instead, they rely on length, grammar, and other measurable elements.

So, he says, the computer would give a high score to this gibberish:

““According to professor of theory of knowledge Leon Trotsky, privacy is the most fundamental report of humankind. Radiation on advocates to an orator transmits gamma rays of parsimony to implode.’’

A human scorer would recognize this as incoherent babble but the computer would be impressed.

He concludes:

“Education, like medicine, is too important a public resource to allow corporate secrecy. If PARCC does not insist that Pearson allow researchers access to its robo-grader and release all raw numerical data on the scoring, then Massachusetts should withdraw from the consortium. No pharmaceutical company is allowed to conduct medical tests in secret or deny legitimate investigators access. The FDA and independent investigators are always involved. Indeed, even toasters have more oversight than high stakes educational tests.

“Our children deserve better than having their writing evaluated by machines whose workings are both flawed and hidden from public scrutiny. Whatever benefit current computer technology can provide emerging writers is already embodied in imperfect but useful word processors. Conversations with colleagues at MIT who know much more than I do about artificial intelligence has led me to Perelman’s Conjecture: People’s belief in the current adequacy of Automated Essay Scoring is proportional to the square of their intellectual distance from people who actually know what they are talking about.”

Jon Awbrey says:

June 17, 2014 at 10:12 am

Dumb People and Dumber Machines
Should not be in charge of education
Or we are Doomed and Doomeder …

Duane Swacker says:

June 17, 2014 at 10:46 am

The Andoomeda Strain??

- Jon Awbrey says:
  
  June 17, 2014 at 10:50 am
  
  Ja, und Ediocrazy, auch …

Lloyd Lofthouse says:

June 17, 2014 at 10:33 am

The Machiavellian Common Core regime grades student essays using a computer program, not a trained human, and then the results are used to fire teachers and close public schools. And that is already happening.

You might want to ask yourself why?

Because Common Core was designed for failure no matter how you look at it, and that fake failure leads to a bonanza in profits from almost $1 Trillion in annual tax dollars that funds public education in the United States.

In addition, using a computer program to grade essays is cheaper than paying even minimum wage to trained human readers. Imagine the labor costs paying people who breathe to read essays written by almost 50 million children—-even at minimum wage with no benefits.

What’s happening in the fake education war against public school teachers and teacher unions is leading to the largest heist in history and once we count all the victims it will add up to 99% of America’s population. The richest 1% don’t count because their money is already protected by loop holes, tax shelters and off shore bank accounts.

English Teacher in California says:

June 17, 2014 at 4:21 pm

Amen. So how do we get people to pay attention?

Jim says:

June 17, 2014 at 10:34 am

The quoted gibberish sounds like a lot of academic drivel.

Jon Awbrey says:

June 17, 2014 at 10:40 am

It sounds like the “essays” in my WordPress spam trap that their BayesBot found suspicious but wasn’t quite sure enough about to keep from forcing a human, yours truly, to grade.

Duane Swacker says:

June 17, 2014 at 4:57 pm

Not quite enough upstairs, Jim, for you to understand that “academic drivel”, EH??

Ken Watanabe says:

June 17, 2014 at 10:59 am

Now I got it. CC testing machine scanning human-made essays based on robo-metrics is creating language called Gobblish.

Duane Swacker says:

June 17, 2014 at 4:56 pm

“. . . is creating language called Gobblish.”

Correction “is creating a language called Bullshit.”

- Ken Watanabe says:
  
  June 18, 2014 at 7:11 pm
  
  LOL.

Threatened out West says:

June 17, 2014 at 11:55 am

It’s not just the PARCC that’s doing this. In Utah, the state writing exams for grades 5 and 8 have been graded by computer for years. No one has been sure if the new AIR tests would be graded by computer, but in yesterday’s newspaper, it was stated that, while this year’s test results will not be back until November, that in the future, the test results will be returned quickly. Considering that all students from grades 6-12 are writing TWO essays each on these tests, they’ve got to be graded by computer.

http://www.sltrib.com/sltrib/news/58060433-78/students-tests-sage-state.html.csp

twinkie1cat says:

June 17, 2014 at 12:02 pm

So here is how your get high test scores. Give the students a list of words the computer likes and have them memorize them. Don’t worry about teaching meaning. Teach the students to write grammatically and to use punctuation appropriately. This is memorized along with the words. Practice writing essays using the word list in grammatically correct sentences, or what appear to be sentences. Result: High scores! Even the mentally handicapped students could do this with some practice and they can have a list of the words and basic grammar rules on hand as a 504 accommodation.!

Threatened out West says:

June 17, 2014 at 1:08 pm

That’s what I tell my students–use big words and long sentences. As long as they do that, they’ll get good scores. It’s SO frustrating.

FLERP! says:

June 17, 2014 at 12:18 pm

Can someone remind me: Is there a plan already in place to use “robo-grading” for essays on the PARCC assessments as soon as they replace the Common Core assessments? Or is this up in the air.

I’ve never understood why essay questions should be included in standardized tests, since essays inherently resist standardization.

Bob Shepherd says:

June 18, 2014 at 3:44 am

exactly, FLERP. And a lot of so-called “higher-order thinking” isn’t easily measured using bubble items, which are most appropriate for measuring simple factual recall. (I’m not saying that this cannot be done; I am saying that the new tests contain a lot of times that attempt to press “objective format” items into a kind of service for which they are not well suited, which makes a lot of the test items really tricky. This is one of the reasons for the enormous failure rates on the New York Common Core exams.

drext727 says:

June 17, 2014 at 12:37 pm

Reblogged this on David R. Taylor-Thoughts on Texas Education.

TC says:

June 17, 2014 at 12:42 pm

Shouldn’t be to hard to game the test. Just use the phrase status quo like five times and score a zillion points.

Chiara says:

June 17, 2014 at 6:18 pm

Is The Consortium subject to demands for transparency? They’re a public-private partnership, right?

In other words, a publicly-funded private entity?

How would one even go about petitioning “PARCC”? Not that it matters, since Pearson is a contractor and unlikely to reveal anything, ever.

nancypeske says:

June 17, 2014 at 6:30 pm

As a professional writer and author of many books, I can’t understand why anyone would leave the grading or evaluating of essays to machines or even to $11 B.A. holders who answer Craig’s List ads. Even the best grammar software stinks compared to a decent copyeditor who knows grammar. If you want someone who can actually judge the essay on merits, display of critical thinking skills, and grammar, spelling, and mechanics, you can’t cheap out or program a computer to do it–at least, a computer can’t do it yet. That’s the beauty of English. It defies simple algorithms.

Gerri K. Songer says:

June 17, 2014 at 10:27 pm

FYI: PARCC alert! Read blog below. Also, for a great read: http://www.amazon.com/Chronicle-Echoes-Implosion-American-Education/dp/1623966736/ref=sr_1_2?s=books&ie=UTF8&qid=1403058110&sr=1-2

Bob Shepherd says:

June 18, 2014 at 3:39 am

There’s a lot of really, really fascinating work being done based on statistical analysis of text. This work started back in the early part of the last century with compilation by Thorndike and others of word frequency tables based on various corpora. I’ve long felt that my field of studies in the English language arts could benefit from more empirical study. A little such goes a long way.

Consider, for example, that many claims that one finds in, say, a standard writing textbook, are empirically false (i.e., that most paragraphs have topic sentences, that writing is done in distinct modes of narration, exposition, and persuasion or argumentation, that literary texts are distinct from informational ones). Or consider the notions that people learn a significant portion of the grammar or vocabulary of a language via explicit instruction in grammatical forms–both demonstrably false. Or consider that ELA textbooks sometimes use the term “theme” as though it meant “subject matter” (Theme: growing up) and sometimes as though it meant “message” (Theme: seize the day) or that the Common Core State Standards confusingly use the term “point of view” both to refer to the narrator’s stance vis-a-vis the story being told (first-person, third-person, limited, omniscient) and to refer to beliefs or opinions. Or consider the difficulties involved in defining, clearly, even the simplest ELA term. What is a “word,” for example? Is walked a different word than is walk? Is the verb walk a different word from the noun walk? Is the noun walk meaning “a stroll” a different word from the noun walk meaning “a path”? Such sloppiness in definition wouldn’t do in, say, physics or chemistry. And what of the claim that the CCSS or the typical writing rubric lists measurable objectives–ones to be measured by assessments? Isn’t a measurable objective one that has been operationalized to such an extent that one can decide, definitively, whether it has or has not been met, and if so, does a statement like, “The essay is clearly organized” pass muster?

E. D. Hirsch, Jr, had a valuable insight, I think, when he recognized that what stands in the way of a student’s comprehension of a given text is often lack of possession of the background knowledge that the writer took for granted. His assertion that there exists a body of background knowledge held in common in speech and writing communities seems, to me, unassailable, and importantly, it’s an empirical claim that can be studied empirically.

At the beginning of the last century, formalists like Roman Jakobson made important contributions to our understanding of language by recognizing the role played by distinctive features. Kenneth Burke made some intriguing suggestions for extending such analysis to produce a clearer understanding of prosodic features–just what makes this particular line from Keats so memorable? Well, the traditional language we’ve inherited for understanding that is way too crude, as Burke pointed out. One needs to go beyond sounds to analysis of distinctive features.

Of late, there has been some really interesting work being done by a new generation of scientifically minded literature scholars who are applying statistical analysis to matters of literary style and genre. See, for example, The Blackwell Companion to Digital Literary Studies:

http://www.digitalhumanities.org/companionDLS/

So, there is much, much to be gained from scientific approaches to the study of literature and from statistical analyses of texts. I think, for example, that such studies will clearly show that one person unlike any other writing at the time wrote almost all of the corpus traditionally attributed to Shakespeare. And I welcome putting to rest a lot of nonsense, there. And think of the light that such close scientific analysis by people like Alfred Lord and those who came after him shed on the production of oral epic (via metrical improvisation employing formulaic bits).

All good stuff.

But this computerized grading of essays is purest pseudo-scientific hokum.

June 18, 2014 at 4:01 am

If numbers were knowledge, then IQ-mongers would not be idiots.

–Franco Moretti, Atlas of the European Novel

Strong language, but I understand where he is coming from.

lsanson says:

June 18, 2014 at 4:31 am

Reblogged this on TN BATs BlOG.

Norm says:

June 18, 2014 at 2:05 pm

Did Perelman site where Pearson stated this comment? If computers were grading these essays why will the data not be released in a timely manner?

June 19, 2014 at 3:04 am

One of the huge issues with computer-based essay scoring is that this depends upon a model that involves production of essays on a general topic appropriate to a large audience and not involving significant content-area learning over time by the student.

Those who believe, as I do, that essays are a GREAT way to evaluate learning but that they should be opportunities for students to demonstrate deep, significant learning of content and concepts acquired over time, abhor the use of vague, generalized prompts of the kind typically used on these machine-scored assessments.

A much wiser approach, I think, is the sort of approach taken in the French baccalaureate exams. It would be entirely inappropriate, of course, to have these graded by machine.

Lloyd Lofthouse says:

June 19, 2014 at 11:28 am

For several years, I graded students essays during the summers. I think this started in the 1990s. The first few years, we were paid about $15 an hour. All the readers were teachers then. First, we spent an entire day being trained to use a standard rubric and then for several days we worked in teams reading and ranking essays using the rubric. To be fair, each essay was read by several readers who were organized into those teams and the final score was an average of the team. But if the team was split with a wide gap on the rubric score, then a mediator/judge was called in to read and rate those essays using the same rubric that rated the grammar and mechanics separate from the content.

Eventually, when California had one of is regular budget crises because of falling tax revenues (some economic bubble burst), teachers were dropped from this process and the state turned to university grad students with no teaching experience who were paid minimum wage because teachers cost too much. After that, I have no idea if the process was streamlined further to cut out the teams and the mediators and rely only on one person, who had never taught English in the public schools, reading each essay.

Now they have taken it one more step by turning the process over to a computer program with no human judgement or interaction.

Writing Expert: Computer Scoring by PARCC-Pearson Tests Is Fundamentally Flawed

27 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats

Writing Expert: Computer Scoring by PARCC-Pearson Tests Is Fundamentally Flawed

Share this:

27 Comments Post your own or leave a trackback: Trackback URL

Leave a comment Cancel reply

Search All Posts

Previous posts

Recent posts

Blog Topics

Top posts

Follow blog via email

Follow blog via RSS reader

Blog Stats