NPR ran an interesting story about robo-grading student essays. It didn’t use the headline that appears here, but there is no better way to describe the insanity or stupidity of asking a machine to judge a student essay. First, it shows disrespect for the student. Second, it diminishes the importance and value of language. Third, the machine can be easily fooled, as Les Perelmann of MIT has demonstrated conclusively with his studies of how easy it is to fool the machine. Basically, the machine can’t evaluate facts. Perelman showed that a student could write an essay declaring that the War of 1812 took place in 1945, and the machine would not recognize it as an error.
To demonstrate, he calls up a practice question for the GRE exam that’s graded with the same algorithms that actual tests are. He then enters three words related to the essay prompt into his Babel Generator, which instantly spits back a 500-word wonder, replete with a plethora of obscure multisyllabic synonyms:
“History by mimic has not, and presumably never will be precipitously but blithely ensconced. Society will always encompass imaginativeness; many of scrutinizations but a few for an amanuensis. The perjured imaginativeness lies in the area of theory of knowledge but also the field of literature. Instead of enthralling the analysis, grounds constitutes both a disparaging quip and a diligent explanation.”
“It makes absolutely no sense,” he says, shaking his head. “There is no meaning. It’s not real writing.”
But Perelman promises that won’t matter to the robo-grader. And sure enough, when he submits it to the GRE automated scoring system, it gets a perfect score: 6 out of 6, which according to the GRE, means it “presents a cogent, well-articulated analysis of the issue and conveys meaning skillfully.”
“It’s so scary that it works,” Perelman sighs. “Machines are very brilliant for certain things and very stupid on other things. This is a case where the machines are very, very stupid.”
Because computers can only count, and cannot actually understand meaning, he says, facts are irrelevant to the algorithm. “So you can write that the War of 1812 began in 1945, and that wouldn’t count against you at all,” he says. “In fact it would count for you because [the computer would consider it to be] good detail.”
Perelman says his Babel Generator also proves how easy it is to game the system. While students are not going to walk into a standardized test with a Babel Generator in their back pocket, he says, they will quickly learn they can fool the algorithm by using lots of big words, complex sentences, and some key phrases—that make some English teachers cringe.
“For example, you will get a higher score just by [writing] “in conclusion,’” he says….
In places like Utah, where tests are graded by machines only, scampish students are giving the algorithm a run for its money.
“Students are geniuses, and they’re able to game the system,” notes Carter, the assessment official from Utah.
One year, she says, a student who wrote a whole page of the letter “b” ended up with a good score. Other students have figured out that they could do well writing one really good paragraph and just copying that four times to make a five-paragraph essay that scores well. Others have pulled one over on the computer by padding their essays with long quotes from the text they’re supposed to analyze, or from the question they’re supposed to answer.
But each time, Carter says, the computer code is tweaked to spot those tricks.
“We think we’re catching most things now,” Carter says, but students are “very creative” and the computer programs are continually being updated to flag different kinds of ruses.
As someone who devotes her life to writing and trying to communicate meaning, I reject the idea of robo-grading as insulting to the craft.
It demands garbage. It deserves to get what it asks for.
Anyone complicit in this betrayal of educational values should be ashamed.

I have thought for years that the way the essays are “graded” in Utah is ludicrous. The students can get their essay “scores” almost instantly–how does that show good writing? The ELA teachers just tell me that kids should be able to “write about any prompt, because we teach skill, not content.” There are up to five prompts for each grade, and they are randomly given to the students. Last year, the scores overall were so low that the state of Utah had to redo the rubrics so that the scores were higher. Interesting that the assessment person for the state doesn’t mention THAT.
LikeLike
California required a student essay that was part of the competency exam required for HS graduation to be graded by teams of trained teachers during the summer.
It was called The California High School Exit Exam (CAHSEE)
Prior to the CAHSEE, the high school exit exams in California were known as the High School Competency Exams and were developed by each district pursuant to California law. In 1999, California policy-makers voted to create the CAHSEE in order to have a state exam that was linked to the state’s new academic content standards
The essay portion provides a question that will prompt the student to write a persuasive essay, a business letter, a biography, a reaction to literature, or an analysis on the subject of the question. For example, in 2002, one group of students was asked to write an essay that persuaded people not to leave trash on the school grounds. Essay questions change with each test date. The essay portion is scaled out of one to four (with zeros given in special cases, such as for off-topic or non-English responses).
Twice, during two summers, I was on one of those teams of teachers that evaluated the essays. We gathered at a location where we were trained and then sat in teams at tables. Every teacher read every essay and evaluated them using a rubric scale of 1 – 5. If we did not agree on an essay, a roving teacher/administrator would be called over to read the essay without seeing our ratings and then rate it. Their score was used as a tiebreaker.
Later, the state watered this method down because teachers cost too much at, I think we were paid $15 an hour, so they started to hire college students to grade those essays the same way.
We usually met for one or two days and worked all day reading hundreds of essays. And the location I went to wasn’t the only one, They were all over the state. Hundreds of evaluation locations where volunteer teachers were trained and rated student essays from districts that they did not teach in.
The only reason they are using computers is to lower the cost even more as they keep cutting taxes and public services while increasing military spending, starting more wars, building more for-profit prisons, and growing the national debt to outrages amounts is to ensure corporations, millionaires and billionaires can continue to grow their bloated, cancerous wealth that they use to buy more political power so they can grow their wealth even faster.
LikeLike
Could be used to help identify what needs revision, not scoring.
LikeLike
Not always. When I used to have my students write essays on a computer system similar to the one Utah uses to grade end of year exams, students would get “flagged” for all kinds of weird things, including changing sentences to make them NOT gramatical. Sometimes, we haven’t been able to tell what the so-called “problem” was that the program thought was wrong.
I read all of the essays myself, so it didn’t matter as much. We’ve gone back to hand-writing them now. Less chance of plagiarism, actually.
A few years back, a student got a great score from the computer. As soon as I read it, I knew it was 100% plagiarized. He’d copied all of his “essay,” word for word, from a website, and pasted it to the essay. But if I hadn’t read it (and some teachers don’t), he would have gotten a great score.
I know a lot of you hate AP, but at least they have live humans scoring their essays.
LikeLike
I believe you.
LikeLike
What a horrible experience. And I bet someone in an administrative position has never reached out to you to ask your opinion about this.
LikeLike
OF COURSE NOT! No one–least of all the administrators and the state board of education–ever asks actual teachers our opinions. In fact, we are not even supposed to criticize the test or even suggest that it doesn’t matter if the kids don’t do well on it. BUT, teachers have been allowed to go on the news and rave about how “wonderful” the tests are.
There are enough teachers in this state who think the tests are so awesome that the state can hear what it wants to hear. The rest of us are ignored.
LikeLike
I was one of those readers for AP for five years. That AP Language test is a fast track to mediocrity.
LikeLike
GregB,
“And I bet someone in an administrative position has never reached out to you to ask your opinion about this.”
I don’t call them adminimals for nothing!
LikeLike
Posted at https://www.opednews.com/Quicklink/Robo-Grading-Student-Essay-in-General_News-Diane-Ravitch_Facts_Language_Stupidity-180715-479.html#comment706457
Read my comment there with embedded links to Diane’s posts here.
I wrote:
Graduate kids who cannot write and this nation is on its way to utter failure.
First they took out the of thousands of the most experienced teachers, and replaced them with novice practitioners who follow mandates.
Recently, if you follow the real news about education, there is a move, likethis one in NY to lower standards for teachers.
Pennsylvania: Not One of 18 Cybercharters Meets State Standards
Go to my series, here at OpEd News and discover how public education is being decimated in the 15,880 separate school systems in 50 states!
https://www.opednews.com/Series/15-880-Districts-in-50-Sta-by-Susan-Lee-Schwartz-140921-34.html?f=15-880-Districts-in-50-Sta-by-Susan-Lee-Schwartz-140921-34.html
That is right almost sixteen thousand systems and thus, the public has no idea of the destruction as the Trump circus plays 24/7 and, behind the curtain our public schools are going down! With it goes income equality!
Read : Tom Ultican: How NCLB and Charters Killed the Public Schools of Inglewood, California
Steven Singer here explains why any public school, no matter how “bad,” is better than ant charter school, no matter how “good.”
LikeLike
Are we using computers to teach kids or using kids to teach computers?
LikeLike
With the race to replace humans with proven flawed and biased artificially intelligent computers that are supposed to learn on their own, probably both but what they learn will be filtered through the biased algorithm that guides how and what they learn.
“The Human Bias in the AI Machine
How artificial intelligence is subject to cognitive bias.”
“Like the human brain, artificial intelligence is subject to cognitive bias. Human cognitive biases are heuristics, mental shortcuts that skew decision-making and reasoning, resulting in reasoning errors. Examples of cognitive biases include stereotyping, the bandwagon effect, confirmation bias, priming, selective perception, the gambler’s fallacy, and the observational selection bias. The total number of cognitive biases is constantly evolving, due to the ongoing identification of new biases.”
https://www.psychologytoday.com/us/blog/the-future-brain/201802/the-human-bias-in-the-ai-machine
LikeLike
I wrote comments on report card of ELLs in Spanish, since none of my parents could read in English. I sometimes checked my work against one of these AI translators. Some of the results were ridiculous. AI routinely used wrong forms of words, and sometimes came up with the wrong meanings of words. Sometimes their translations made no sense.
LikeLike
retired teacher: Google translate is getting better. I don’t use it for my own writing/ planning, but I’ve been checking it regularly since I’ve had older tutees in Fr & Sp (in addn to my reg PreK enrichment work). (Because those families would use it instead of my recommended wordreference.com site.)
As recently as a decade ago it was only good for single vocab words. It now handles commonly used phrases/ expressions. It is beginning to get there on originally-composed phrases & sentences, but stumbles a lot. E.g., it recognizes some but not all Eng equivs for imperfect tenses (try “used to ___” & it resorts to “soler”). E.g., cannot manage complex subordinate clauses (can do “We could have done that”, but not “I wish we could have done that.”)
I think it needs another decade 😉
LikeLike
The latter. The promoters of adapting testing, a component of (de) personalized learning use student responses to train computers. The end game is a computer said to have “artificial intelligence” meaning that it has algorithms that to predict and respond to large datasets without human intervention.
LikeLike
Elon Musk’s latest Twitter outburst today led me to this NY Times opinion article, “What Elon Musk Should Learn From the Thailand Cave Rescue”. It could have also been titled, What Bill Gates, Pearson and Supporters of the Testing Industry Should Learn from the Thailand Cave Rescue. Please enjoy if you haven’t already (and it shouldn’t be behind a paywall, but if it is, the paywall will probably be taken down for a while if you clear your cookies): https://www.nytimes.com/2018/07/14/opinion/sunday/elon-musk-thailand-hubris.html
LikeLike
Link is so on point. To Zuckerberg/ Chan as well.
LikeLike
Many years ago, I was judging a high-school speech event. One contestant, a young woman, had decided to perform e.e. cummings’s “r-p-o-p-h-e-s-s-a-g-r”–that exploded poem–as a piece of scat singing of the kind done some wonderfully by Ella Fitzgerald. Brilliant. A perfect marriage of content and form. I gave her performance the highest score. The other two judges, having no clue what she had done, gave it the lowest score.
Here’s the thing: yes, these programs have become good enough at statistical analysis of compositions to predict with a fairly high degree of certainty what a grade given by human readers would be. However, they don’t do this using anything like applying human knowledge and skill to comprehension of the text. So, this is cheating. And, the algorithms can easily be fooled by gibberish. And the algorithms can’t recognize brilliant deviation.
I have often fantasized about how Rumi or Blake might have answered one of these ridiculous test prompts. LMAO.
But the bigger problem is with the designs of the tests. They don’t measure what they purport to measure. They are neither valid nor reliable. They are a scam.
And, ofc, they dramatically distort pedagogy and curricula in K-12 schools.
LikeLiked by 1 person
CX: of the kind done so wonderfully by Ella Fitzgerald.
LikeLike
Love that story of your judging experience! You should write a short story about it!
LikeLike
Yes, your anecdote is great. I mean, hi can AI right now judge judgment, when we humans often don’t do an even passable job. Look who we elected POTUS. Look how people often behave, even when relatively self-regulated, on blogs and Twitter. The heart and depth of the thinking in most excellent essays cannot be plumbed with accuracy by AI or most humans. What could current AI (forget about mere algorithms) really ‘know’ about what and how things would resonate with a human? What things may seem convincing while technically not in terms of strict logic, or vice versa? What things are of fundamental human interest and what are not? The implicit subjective phrases in the above questions, and this fragment?
LikeLike
I mean how can AI . . .
LikeLike
Look who we elected POTUS in 1980. Well, at least Reagan could speak coherently, But the downfall of the oral and then written tradition of the discourse went down the tubes with the proliferation of TV, then the Internet. These visual media are not intended to deliver thought, they are intended to entertain with images. Have you read Neil Postman’s books?
There was intrusion of TV into education, not it is computers, which is TV on steroids: you can read off screen, you can watch video, you can type back your answer, soon you will be able simply to speak back your answer, have you seen then latest developments of Google Voice? It all will end up like in The Idiocracy (it’s a movie).
LikeLike
Reagan looks like a stable genius compared to Dump.
LikeLike
Well, I obviously did not elect Regan in 1980, was too young…
LikeLike
THE TRULY scariest statement about a future we are passively falling into, that future where the tech boys control not only our schools but our businesses: “… the algorithms can’t recognize brilliant deviation.”
LikeLike
“They don’t measure what they purport to measure.”
They don’t measure anything at all. It’s all a big effin game of horse manure thrown onto the students.
LikeLiked by 1 person
It is garbage. And wrong. And harmful to students and teachers.
But hey-we actual writing TEACHERS know this, but why ask us? We are only the educated professionals who do this for a living.
What an insult. And could damage students and teachers.
LikeLike
I’ve been showing the Babel Generator to my high school students for 2 or 3 years now. One thing they notice from the Babel-induced essays is a heavy use of semicolons. My daughter subsequently wrote a semicolon-heavy essay for her SATs and she scored higher than expected. I can’t say the use of Babel helped her, but it sure didn’t hurt.
LikeLike
I, too, wrote, on this: https://onefleweast.net/2018/07/01/unhand-me-grey-beard-loon/
LikeLike
Wonderful essay. Thank you!
LikeLike
Get with the program! NPR is largely funded by the Gates Foundation. No more need be said. And see the series “Humans” on Amazon. It is about the conflicts and indignities produced by robotization.
LikeLike
At least NPR reported on the idiocy of machine graded essays. Just because they’re NPR doesn’t mean they’re always wrong.
LikeLike
NPR has historically favored charters over public schools. It’s not umtill the last year that they are reporting more fairly on public education, and at best, their coverage remains imbalanced.
LikeLike
This is not a problem of a robo-grader. This is a problem of the essays not being graded at the school by the human teachers. The whole idea of testing companies is ludicrous in many other countries.
LikeLike
Every kid in Utah has to write two essays, starting in third grade, for each end of year test. At least that’s been the way it has been, but we’re changing tests (again) next year, so who knows what will happen.
Grading that many essays by teachers would take a while, and Utah is too cheap to pay teachers extra for things like that. Most of the time, we are asked to do everything for free, because, “It’s for the children.” You would be amazed how little we are paid here, for anything. Our base salaries are higher than some states, but we get paid for nothing extra (unless it’s sports, of course).
LikeLike
If a teacher is capable of teaching X students, than he should be capable of grading 2X essays per year. If this becomes an unsurmountable problem then I suppose the teacher cannot teach X students at the first place, I mean teach them in a classic school fashion, not in a college lecture fashion. Put it plainly: if a teacher has more than a fifty students then grading anything becomes a burden and cannot be done in time. This shows that the traditional American system with middle school collecting students from multiple elementary schools, and high schools collecting students from multiple middle schools does not work efficiently. A high school may have an Olympic-size stadium, but the teachers’ attention per student is much lower than in elementary school.
LikeLike
I have 300 students in grades 7-9. That is not a type-o. Utah class sizes are ridiculously large. I would love to see you try to grade 600 essays.
LikeLike
ON a related note dealing with algorithms, more than one artist has figured out how to deceive patents for facial recognition. The software is being used as part of police and traffic surveillance systems, in-store tracking of customers and recently for activating computers and other devices. Techniques for disrupting facial recognition systems date back to WW1 camouflage for ships at sea, dubbed “Dazzle.” The patterns of paint on the ships, resembled the much later Op Art style.
These are recent projects by artists.
https://ahprojects.com/projects/cvdazzle/
https://ahprojects.com/projects/megapixels-glassroom/
Apple has multiple patents for facial recognition. This is one https://techcrunch.com/2013/12/03/apple-patents-face-recognition/
This article is more technical but describes the system of facial recognition security used in Russia for the Olympics and now found in some airports.
https://eforensicsmag.com/biometric-facial-recognition-database-systems/
LikeLike
Technology is becoming more and more a part of our everyday, but we cannot take technology to replace the subjective grading that humans are able to complete. Successful education is creating citizens that we can entrust our society with, not students that are just learning to work the technological system. We are not creating critical thinkers and learners in this sense. Michael Apple is an amazing theorist that questions where knowledge comes from, and in this sense there is not true knowledge on the part of our students. We are failing our students and or society by allowing this. We are not taking the challenge of being a facilitator to their journey on education. We are simply becoming a mindless robot ourselves. We are dehumanizing the educational world and looking at everything as a bar code, a test score, a QR code of sorts. Just as Henry Girouz proposed this takes all thinking, acting, reasoning, and judgement out of their education because it has no need and no place. Students will lose passion for learning and motivation will diminish if we sit back and allow this to take place. We must be an agent of change and stop the madness that will become. What academic benefit will students gain from this movement? Language is the basis of so much, and it is not taken into context by a computer. Only the systematic use of grammar….
LikeLike
It makes me sad (& mad) we even have to have this conversation, it is such a no-brainer.
Once I worked in procurement for an engrg co. We designed & built power plants. Every engrg discipline had a team, so did procurement, & proj mgt, & we all worked in a matrix where your responsibility was equally to your dept & to the project – & you had to find a consensus on many contentious issues, move forward & get it done well. The kind of writing I had to do to as a procurement supv to persuade the proj team to observe procurement reqts (such as truly competitive bidding) while meeting their respective goals was daunting, & took every ounce of writing skills I learned in K12 & college. I was a lit major who’d learned the tech on the job; it was easier for me than many.
My husband worked in the same field/ corp: tho trained as an engr, he was by nature a good listener & communicator, & had had a hi-qual well-rounded hisch ed, which led him to pursue lit-crit as well as tech courses in BS/ MS, & all that had much to do w/his becoming a mgr of engrs who were tech geniuses but needed much guidance & editing when it came to communicating their ideas.
Nothing has changed in that field as to communicative reqts. AI has made no inroads there whatsoever. My husb tho past retirement age is much in demand for these qualities. Their corp recruits hard for any young engrs they can find w/that combo of skills & usually can’t find them.
We do our K12 a terrible disservice in short-shrifting them w/stunted writing skills. This is a problem that goes back longer than the advent of phony AI-graded essays. That is just the latest & most ludicrous consequence. It goes back to when we decided ed – especially “the humanities” [Oops! Includes writing skills] was not so important- not worth the tax $ expended – & started cutting back on budget for staffing thus increasing class sizes, which decreased the time available for reviewing essays, which has led to that decline in hisch grad writing skills you have heard deplored by fresh coll (& comm coll) teachers for the last decade+.
LikeLike
One year Tennessee graded it’s required essay for a test with people from a retirement home. Another year, the Knoxville football players. If these rumors are true, we were victims of idiocy. Now computers.
The time has come to reject grades and tests and replace them with close personal relationships and intensive work. Bill Gates can fund it since he is so philanthropic. We need millions of teachers more than we have now. Sorry Hal, computers cannot think, only calculate. AI is BS.
LikeLike
AI is BS.
Love that, will have to start using it!
LikeLike
Given that administrations have framed machine approaches to writing as a replacement for human teachers, I understand the negative reaction. But what if machines were used to leverage the high context, expert knowledge of teachers, for example in teaching genre patterns and registers? Genre and disciplinary writing moves can be detected by machines, and that can both teachers focus on the specific needs of a given student, but being able to visualize those patterns in student writing is powerful and persuasive for the student as well.
https://www.textinsight.org/academic/
LikeLike
Do you worship machines and consider humans worthless?
It has been predicted that the US will lose 73 million more jobs to semi-intelligent, automated machines by 2030. That will happen in the next 12 years.
https://www.usatoday.com/story/money/2017/11/29/automation-could-kill-73-million-u-s-jobs-2030/899878001/
And more than 80 percent of the jobs that many people think was already lost to China and other 3rd world countries were really lost to automation (machines) and that production never left. U.S. Factories produce more than twice today than they produced in 1980 while more than 40-million Americans lost their manufacturing jobs.
“Automation has transformed the American factory, rendering millions of low-skilled jobs redundant. Fast-spreading technologies like robotics and 3D printing will exacerbate this trend,” says Ms Solís.
https://www.ft.com/content/dec677c0-b7e6-11e6-ba85-95d1533d9a62
In 2017, about 125.97 million people were employed on a full-time basis. Subtract 73 million workers from that number and what’s left is almost 53 million people that still have jobs.
What are we going to do with all those out of work Americans and the people the workers support — send them to the gas chambers and get rid of them so they won’t become homeless and be a burden to the few that still work and pay taxes?
Creating jobs that pay livable wages for humans should be the priority — not machines and automation.
LikeLike
Hi Lloyd. No, I don’t worship machines and consider humans worthless. That’s the exact opposite of what I said–I think you may have misunderstood me 🙂
LikeLike
Please explain again what you meant but please keep it short.
LikeLike
Replacing human writing instructors: bad.
Helping human writing instructors: good.
LikeLike
And who decides what machine app will help human writing instructors?
LikeLike
Department chairs, writing center directors, teaching faculty, administrators, you, me.
LikeLike
I don’t think so. Not in today’s environment. Most if not all of the decisions are being made by billionaires like Bill Gates or Suckerberg.
LikeLike
Sad. Sad. Sad. Robo graders….
We seem to have forgotten WHY we ask students to write essays. Is it for real writing with real ideas or is it to produce structurally sound drivel about made up “facts” and nonsense?
In the end, we’ll get what we reward. Big words, complex sentences about vapid drivel that are spelled right and structurally correct.
Of course, students will quickly pick this up–and we’ll be off on another ineffective and costly tangent. Another education reform gone awry?
Let’s hope not. But these signs are worrisome.
LikeLike