Politico reports this morning:
PARCC says many states with Common Core-based assessments will use automated scoring for student essays this year. A spokesman says that in these states, about two-thirds of all student essays will be scored automatically, while one-third will be human-scored. As in the past, a spokesman said about 10 percent of all responses will be randomly selected to receive a second score as part of a general check. States can still opt to have all essays hand-scored.
This is another reason to opt out of the state testing.
Do you think that PARCC is unaware of the studies by Les Perelman at MIT that show the inadequacy of computer-graded scoring of essays?
Here is a quote from an interview with Professor Perelman, conducted by Steve Kolowich of the Chronicle of Higher Education:
“Les Perelman, a former director of undergraduate writing at the Massachusetts Institute of Technology, sits in his wife’s office and reads aloud from his latest essay.
“Privateness has not been and undoubtedly never will be lauded, precarious, and decent,” he reads. “Humankind will always subjugate privateness.”
Not exactly E.B. White. Then again, Mr. Perelman wrote the essay in less than one second, using the Basic Automatic B.S. Essay Language Generator, or Babel, a new piece of weaponry in his continuing war on automated essay-grading software.
“The Babel generator, which Mr. Perelman built with a team of students from MIT and Harvard University, can generate essays from scratch using as many as three keywords.
“For this essay, Mr. Perelman has entered only one keyword: “privacy.” With the click of a button, the program produced a string of bloated sentences that, though grammatically correct and structurally sound, have no coherent meaning. Not to humans, anyway. But Mr. Perelman is not trying to impress humans. He is trying to fool machines.
“Software vs. Software
“Critics of automated essay scoring are a small but lively band, and Mr. Perelman is perhaps the most theatrical. He has claimed to be able to guess, from across a room, the scores awarded to SAT essays, judging solely on the basis of length. (It’s a skill he happily demonstrated to a New York Times reporter in 2005.) In presentations, he likes to show how the Gettysburg Address would have scored poorly on the SAT writing test. (That test is graded by human readers, but Mr. Perelman says the rubric is so rigid, and time so short, that they may as well be robots.)
“In 2012 he published an essay that employed an obscenity (used as a technical term) 46 times, including in the title.
“Mr. Perelman’s fundamental problem with essay-grading automatons, he explains, is that they “are not measuring any of the real constructs that have to do with writing.” They cannot read meaning, and they cannot check facts. More to the point, they cannot tell gibberish from lucid writing.”

Looks like somebody decided that it doesn’t matter if everyday people’s children are learning or thinking, that it doesn’t matter what you write as long as you write a lot of it. Do not pause. Keep pushing that pen. Keep pushing those buttons. Sit up straight. Keep your head down. Now, work. Learn grit. Learn stamina. Do not question. Do not speak. Work for me. Work for me. You all work for me!
LikeLike
“The Master Plan”
Initial step‘s to break their will
The second step’s to tame
The final step’s to work the mill
With robots, all the same
LikeLike
My friend’s daughter came home with a 21 out of 40 score on a computer corrected essay. She’s in elementary school. So the mom went to try out the machine at the school. She typed randomly and got a grade of 18 points out of 40.
So basically a monkey can sit at this machine and score 50%. What the heck is the point of any of this? Why even think of something so insane? And what parent in their right mind would let their child be graded by a machine?
LikeLike
Imagine one million monkeys typing randomly for one million years on PARCC essay measuring machines. And imagine one of them, by shear chance types:
Shall I compare thee to a summer’s day?
Thou art more lovely and more temperate:
Rough winds do shake the darling buds of May,
And summer’s lease hath all too short a date:
Sometime too hot the eye of heaven shines,
And often is his gold complexion dimm’d;
And every fair from fair sometime declines,
By chance, or nature’s changing course, untrimm’d;
But thy eternal summer shall not fade
Nor lose possession of that fair thou ow’st;
Nor shall Death brag thou wander’st in his shade,
When in eternal lines to time thou grow’st;
So long as men can breathe or eyes can see,
So long lives this, and this gives life to thee.
Then imagine said monkey hits the submit key after “thee”
Fast forward three months and the monkey receives his score: 18/40
LikeLike
What parent in their right mind would let their child’s writing be scored by a $12.50/hour temp worker with no teaching experience?
Answer:
Every parent who does not opt their child out of PARCC testing!
LikeLike
It’s a heck of a choice Joan.
Inaccurate/random machine scoring . . . or
. . . inaccurate/random human scoring.
Maybe they could try monkeys?
LikeLike
There was a lively debate over on facebook amongst a number of computer science teachers on the merits of automated grading. I’ve also had numerous discussions with any number of Ed Tech entrepreneurs who insist they just want to offload some of the onerous grading so they could focus on teaching.
None of them could understand (the Ed Tech folk but also some of the CS teachers) that a critical part of being a quality teacher is KNOWING your students and as much as grading, well, sucks, reading student work is an important part of knowing your kids and the better you know them, the more prepared to help.
Then again, since to the outside world, testing is not supposed to actually improve instruction but rather a mechanism to punish…..
LikeLiked by 1 person
Excellent comment. I hate grading papers, but when I read what my students have written I can hear their “voices.” I think it’s important that a real person is listening to them.
LikeLike
When it comes to standardized tests, though, whether computer or human graded, no one is listening to student voice. I never gave even a scantron test because as a special education teacher I was looking for patterns in their answers. As part of my job, I modified tests for mainstreamed students based on what I knew about how they could best demonstrate their knowledge. There is not a computerized test on earth that can do what I did.
LikeLike
“Pearsonal Test-taking Assistant”
When robots take the test
Our problems will be gone
Cuz robots are the best
And never ever wrong
LikeLiked by 1 person
Did you hear about the Microsoft bot on Twitter? Apparently, Microsoft bots are racist. And now, they’re not just tweeting, they’re deciding our futures by scoring our high stakes tests. Lovely.
LikeLike
Given the new MS CEO’s recent comments, i would expect the Microbots to be sexist too.
LikeLike
That is the stupidest thing I’ve ever heard .
LikeLike
What is the “that” in your statement?
LikeLike
Oh sorry having computers grade an essay. I taught English for many years and I learned more about students and writing every year. To have a machine just grade for grammar and complete sentences is to deny everything writing is about.
Sent from my iPhone
>
LikeLike
“The Colemanbot”
Designed in a lab at MIT
The Colemanbot for SAT
Unequaled for the standard test
Can beat Commander Data’s best
LikeLike
While I hate to reference a TV show, the last episode of ‘The Good Wife’ is about this very issue. The daughter has written a college essay in which her work, scored by a computer, is declared plagarized. She had referenced a line from ‘The Bible’ without using quotation marks because it was a common well known line. Not only does the mother have a hard time getting anyone to answer her questions, she is shut down by a committee at the university. As she leaves the meeting, Alicia tells the committee to expect to hear from her as she will be filing an enormous class action suit. Maybe this is a hint at our future. We will have to fight in court for our right to be heard.
LikeLike
Soon we will not need people. We will be replaced by computers.
We might as well be now.
Children, teachers are judged on data so data must be more important than people.
Right?????
LikeLike
67% machine scored essays. 33% human scored essays,
Can you spell inter-rater reliability?
LikeLike
If I may correct your question: Can you spell inter-rater UNreliability?
LikeLike
Just testing you DS
LikeLike
Professor Perelman has perfect credentials for poking holes in this absurdity. It is as absurd as the Lexile scores for grade-band reading in the Common Core, developed by Metametrics.
LikeLike
Oh, I just wrote about the PARCC today. I can’t believe, given all that is happening in CPS right now, we are expected to give the PARCC test starting next week!!! http://mskatiesramblings.blogspot.com/2016/03/parcc-and-real-live-children.html
LikeLike
. . . “they ‘are not measuring any of the real constructs that have to do with writing.’”
They are not measuring anything at all much less anything having to do with “real constructs” (whatever those are). There is no measurement in the teaching and learning process. There are assessments, counts, evaluations, judgments, etc. . . none of which have a true measuring/measurement component.
Another example of a false concept being used without the author probably even realizing that he is saying it as the measurement meme is so infected into, so common place in education discourse that it is, unfortunately, accepted carte blanche
LikeLike
If the test is junk, does it make a difference who or what scores it?
“The Teacher” (after “The Lorax”, by Dr. Seuss)
Teacher: I am the teacher. I speak for the kids. I speak for the kids, for the kids have no tongues. And I’m asking you sir, at the top of my lungs – that thing! That horrible thing that you did! What’s that thing that you did with my beautiful kid?
Dunce-ler: Look, teacher, calm down. There’s no cause for alarm. I tested one child, I’m doing no harm. This thing is most useful! This thing is a “PARCC.” A PARCC, a fine something-that-glows-in-the-dark! It’s a test. It’s a count. It’s a score! It’s a VAM! But it has other uses, yes, for fraud and for scam. You can use it for firing, for hiring, for cheats, for charters, or covers for bicycle seats!
Teacher: Sir, you’re crazy. You’re crazy as Snark. There’s no one on earth who will buy that fool PARCC!
[Just then, NY Governor Andrew Cuomo drives up and purchases a million PARCC tests for NY schools]
The Dunce-ler: The birth of an industry, you poor, stupid guy! You telling me what the public will buy?
LikeLike
NJ still has PARCC. Did Christie’s son have computer-scored essays at Delbarton to prepare him for his student work at Princeton?
LikeLike
Delbarton offers a wide range of courses designed to give the student a comprehensive and in-depth background in the liberal arts in preparation for his college years. Most courses are required; some electives are offered in the eleventh and twelfth grades.
The requirements for graduation from the Upper School are: four years of English, Mathematics, and Physical Education/Health; three years of History, including Ancient, European, and American; three years of Science, comprised of Biology, Chemistry, and Physics; three years of a single foreign language; five terms of fine arts and music (three at an advanced level); two terms of Religious Studies each year for four years; two terms of Computer Science; one term of Language Arts and one term of Leadership. Students are required to take five full-year (three-term) courses each year. In the junior and senior year, student may choose electives to complete the five-course-per-year requirement necessary for graduation. Graduation depends upon successful completion of all the above requirements.
In the Middle School a student is required to complete successfully: two years of English, Mathematics, Language, Science, Humanities, and Physical Education/Health; one term each of Computer Science, Music and Art; and two terms each year of Religious Studies. Movement into the Upper School depends upon the successful completion of all these requirements.
NO MENTION OF COMMON CORE OR PARCC TESTING. I’M SHOCKED. ANOTHER REFORMER WHO SAYS, “DO AS I SAY NOT AS I DO.”
LikeLike
Perelman is hypocrit. Now that he’s getting a piece of the action he has changed his tune.
https://www.bostonglobe.com/ideas/2015/09/12/critic-second-thoughts-robo-writing/1IClHyQZClTZGTLFouyM6I/story.html
LikeLike
Perelman and the company CEO indicate that their software is meant to help students improve their writing and that the people who developed it did things the ‘right way”.
From what they have said, it does not seem that their software was designed to address Perelman’s main criticism of automated essay graders: that they can’t “read meaning and check facts” — “tell gibberish from lucid writing”.
I suppose we will see where they really stand when Pearson or some other testing company offers their company a boatload of money to use (or adapt) the software for grading essays. I’d say that such an offer is inevitable.
LikeLike
I am tired of opponents of automated essay grading citing the same argument. If you are smart enough of to write a nonsense article to “trick” the algorithm, then you are probably a good writer. I agree that these assessments are worthless, and I too have opted my 9-year old out of the common core assessments, I just find these type of arguments diminish the overall discourse and do not get to the real issue.
LikeLike
Students will not have to be “smart enough” to trick the computer. They will be taught to game the system during test-rep, I mean class time.
“What was educationally significant and hard to measure has been replaced by what is educationally insignificant and easy to measure. So now we measure how well we taught what isn’t worth learning.”
LikeLike
Any mention of which states are using automated essay grading?
LikeLike
The following is from the Pearson UK paper “Preparing for a Renaissance in Assessment”
“This indicates a growing confidence in automated essay- scoring as means of enabling the assessment of a wider range of outcomes in the context of large-scale, high-stakes testing programmes.”
I think it’s fair to ask, “who is it exactly that has this growing confidence?” How was this determined, and which voices count, and which don’t, and how was this decision made?
That said, I would like to see more information on exactly what is being done. Most of the automated scoring work that I’ve seen promotes using automated scoring as a supplement to human grading, as a way to flag cases to look at in particular, etc. Using software as the *only* measure of a student’s essay is educational malpractice.
https://research.pearson.com/articles/preparing-for-a-renaissanceinassessment.html
A barrier to the use of such assessments has been the difficulty and costs of objectively rating open-ended student responses. However, advances in artificial intelligence in combination with online delivery are helping to overcome some of these barriers. While it might at first seem implausible that a machine could mark an essay, several studies have indicated that automated essay-scoring systems employing ar tificial intelligence are capable of achieving levels of reliability equal to or exceeding that of trained human raters.9 Some widely used systems include Project Essay, GraderTM, Intelligent Essay AssessorTM, E-rater®, IntelliMetricTM and Bayesian Essay Test Scoring SystemTM.
All systems developed thus far have certain limitations, but so too does human rating.10 Currently, automated scoring of extended response questions is usually deployed in high- stakes testing contexts in conjunction with human rating (to provide a second rating or to quality-assure the human ratings, for example). As automated essay-scoring technologies improve, they can be expected to play a much more prominent role.
In the USA,the two federally funded assessment consor tia, PARCC and Smar ter Balanced, both intend to incorporate automated scoring into their common core state assessments, planned for implementation in 2014. This indicates a growing confidence in automated essay- scoring as means of enabling the assessment of a wider range of outcomes in the context of large-scale, high-stakes testing programmes.
LikeLike
Thank you for reading my minding and saying it much better than I could ever hope to.
LikeLike
How seriously sad for those who fail to opt their children out. How amazingly comical for those who believe in this rigor…but only for public school kids.
LikeLike
Todd Farley (author of Making the Grades: My Misadventures in the Standardized Testing Industry, 2009) warned us about this very thing on the Huff Post a few years after his book was published.
Read more about this (in the comments) in Diane’s post RE: Miss Katie’s Blog.
(BTW: in ILL-Annoy, parents cannot opt their kids out–the kids are supposed to opt themselves out–insanity, no?)
LikeLike