I posted earlier today about a new Xerox machine that is being marketed to “read” and grade student essays. Not to score bubble tests, but to grade essays. Granted, this is not a new idea. There are now different companies selling machines to grade student writing. I have seen demonstrations of this technology, and I can’t shake the feeling that this is not right.
Why? I am not opposed to technology. But here is the nub of my discomfort. I am a writer. The moment I realized I was a writer was when I discovered many years ago that I write for an audience. I think of my reader(s). If I am writing for a tabloid, I write in a certain style. If I am writing for the New York Times, I write in another way. If I am writing a letter to a family member, another style. If I am writing for a scholarly journal, something else. When I write for this blog, I have a voice different from the voice in my books. I don’t know how to write for a machine.
Robert Shepherd reminded me how important the audience is for a writer when he posted this comment about the Xerox grading machine:
“The slick piece of marketing collateral that Xerox produced for this product features, most prominently, a picture of a smiling teacher bent over to help a smiling student. But the promise of the product is precisely the opposite–that teacher feedback will be eliminated (automated).
“Clearly, it’s a fairly simple matter to create technologies that correct multiple-choice and other so-called “objective” tests. More troubling is the promise that the technology will score “constructed response” items (in non-EduSpeak, writing). Let’s be clear about this. There is no existing system that can read, as that term is understood when it is predicated of a human being. What creators of such software can do is to correlate various features of pieces of writing that can easily be recognized by software to outcomes assigned those pieces of writing by human teachers.
“So, one might come up with some formula involving use in the piece of writing of terms from the writing prompt, average sentence length, average word length, number of spelling errors, number of distinct words used, frequency of words used, etc., that yields a score that is highly correlated with scores given by human readers/graders using a rubric. At a whole other level of sophistication, one might create a system that has a parser and that does rudimentary checking of grammar and punctuation. Some of that is easy–e.g., does each sentence begin with a capital letter? Some of it is rather more difficult (a system that correctly identifies all and only those groups of words that are sentence fragments would have to be a complete model of grammatical patterns for well-formed sentences in English).
“Who knows whether the Xerox system is that sophisticated. One cannot tell whether it is from the marketing literature, which is a concatenation of glittering vagaries. But even if one had a perfect system of this kind that almost perfectly correlated with scoring by human readers, it would still be the case that NO ONE was actually reading the student’s writing and attending to what he or she has to say and how it is said. The whole point of the enterprise of teaching kids how to write is for them to master a form of COMMUNICATION BETWEEN PERSONS, and one cannot eliminate the person who is the audience of the communication and have an authentic interchange.
“Since these writing graders first started appearing, I have read an enormous amount of hogwash about them from people who don’t understand that we don’t yet have artificial intelligences that can read. Instead, we have automated systems for doing various tasks that stand in lieu of anyone doing any reading.”
The solution is simple. Why doesn’t Xerox invent a machine to write the essays so the other machine can read them. Perhaps Bill Gates could also write to the machines, and leave the rest of us humans to care for and teach our children as we please.
Good one, Arthur.
Perfect solution Arthur.
Diane, I completely agree with you. There is no way that a machine can understand the difference between different writing styles or understand new ones. How can it as it is just a machine. People are different from machines. Their minds are much more complicated. Machines are faster but not more flexible or creative. If we want to ruin creativity just use machines and take the human out of the equation. This is the utmost arrogance and chasing after a new profit center in this corporatization and privatization world of insanity and dehumanization. That is why I am the Director of Policy for the Congress of Racial Equality of California (CORE-CA). We are not bought and sold, have the political connections as a result of that and work for all communities from Beverly Hills, where the new mayor, John Mirish, grandson of the famous producer, asked me to open the Beverly Hills City Council Meeting as a result of the work of CORE-CA and the Crenshaw Subway Coalition’s work to help them to prevent the subway from going underneath their school which is hazardous as a result of the methane and hydrogen sulfide (H2S) there. Just so you understand this is the black and brown community helping Beverly Hills as they also help us with problems with the MTA. You must create coalitions to beat the big power. This was their worst political nightmare. We call this “Community.”
Would you have predicted that a machine could drive a car on roads with cars driven by people?
If machines can replace humans, at what point do we no longer have a reason to drive a car?
I look forward to the time when cars drive themselves. In addition to some personal gains, traffic accidents kill over 1.2 million people a year and that number can be vastly reduced with self driving cars.
Machines can do an amazing job of distinguishing writing styles. But they can’t read. I, too, am very much looking forward to the driverless car. Hell, I’m even looking forward to the emergence of true AI and hope I live long enough to see it. But we are not there yet. We have systems for looking for previously predicted regularities in writing products, not systems that can read. But such regularities are not hallmarks of good writing. They are hallmarks of mediocrity. Robert Frost wrote, “It’s a fake poem and no poem at all if the best of it is thought up first and saved for last.” His comment can and should be generalized. Good writing breaks rules. Systems programmed to look for regularities–to give an exemplary grade to the ideally structured five-paragraph theme are to good writing what thalidomide was to morning sickness. It’s bad enough that we have new national standards calling for writing in three modes (as if it weren’t the case that most writing is narrative or multi-modal) that encourage this sort of five-paragraph theme production, but we are also to have algorithms for commending the creation of such pseduo-writing.
If you bring up the self-driving car as an analogy to the machine-graded essay, it’s a false one.
Cars drive in a pre-determined way, to a specific destination, according to a very narrow set of rules. Writing, while generally governed by rules, is a much more open-ended process, with far more open-ended aims, and can at certain times be improved by rule-breaking. However, it still takes a conscious, experienced human being to recognize and judge these things adequately.
Cars drive in a predetermined way only if one does not mind running over the occasional pedestrian and hitting a few fellow cars along the way.
I think that machine evaluation of writing can be very valuable in the early draft stage. It can free up the teacher to concentrate on listening to the student’s voice without the distraction of poor scentence structure, etc.
Come on TE, you can do better than that-the that being 1.2 million deaths per year. A quick search found that the most auto deaths was 54,589 in ’72 and 32,367 for 2012.
That figure is from the Exonomist a couple of issues ago. It is from the special report on the automobile. It is the estimate of deaths world wide.
Considering the book I read about the people that grade essays (Making the Grades), I think I would take my chances with the machine.
That’s the fault of Pear$on’s hiring practices. Look back at the Dec. 27th interview w/Todd Farley (who wrote Making the Grades) on this blog, & also find his commentary as to machine scoring
(also courtesy of Pear$on!) on the Huffington Post (Feb. 2012, I believe, bur easily accessed). Anyway, Pear$on’s doing both, so 6 of one, half dozen of the other…
Iheartdurham, how about having teachers grade their students’ essays, not Pearson or McGraw Hill?
I agree and the essay should be a topic picked by the teacher based on classwork and not part of a standardized test. However, the powers that be don’t seem to trust teachers and I fear we are stuck with standardized tests. (I am just a parent though and I don’t know if teachers think differently about these tests.)
Having a machines evaluate student essays are a most dumb idea. Wonder what kind of data will be snuffed up when the students take the repressive tests that label them and takes the very breath out of our students and teachers. Learning is about relationships. Tell this to the FEDs and the rest of the DEFORMERS.
These essay grading machines cannot differentiate truth from fiction so it doesn’t matter to the machines what our kids write; it is only about how they write.
The real question is – what we are looking for when we are teaching our students to write? Are we looking for a fill-in-the-blanks, formulaic writing product or are we looking for authentic expression of students critical and creative thinking?
Oh yes – I forgot. We are not supposed to give a s*** about what our students think.
“But even if one had a perfect system of this kind that almost perfectly correlated with scoring by human readers, it would still be the case that NO ONE was actually reading the student’s writing and attending to what he or she has to say and how it is said. The whole point of the enterprise of teaching kids how to write is for them to master a form of COMMUNICATION BETWEEN PERSONS, and one cannot eliminate the person who is the audience of the communication and have an authentic interchange.”
This is pretty circular. A “communication between persons” is not a “communication between persons” if the communication is not between “persons.” Yes, agreed, at least until it gets more complicated to define what forms a “communication” may take (how direct must it be, may it be oral or can it be written, what technology can be interposed between the speaker and the listener), or, one day, what a “person” is.
If you don’t load your terms beforehand, it’s not clear to me that a sophisticated enough machine could not teach someone how to write. One could argue that a machine that sophisticated will never be created, and maybe that’s right. Or one could denounce the machines on principle. But if machine were sophisticated enough, why couldn’t it teach someone how to write? It used to be conventional wisdom that a machine couldn’t beat a man at chess. Then it could play, but it couldn’t beat the greatest human players, because it thought like a machine, not a man. Now the machine can beat anyone, and if the number of people who would argue that the machine can’t teach a man how to play chess are probably few and dwindling. “But language is more complex.” And machines get more complex.
This also reminds me of something someone said, I can’t remember who, about how artificial intelligence engineering never gets any credit, because when there’s something that computers can’t do, the response is that “it’s just a computer, all it does is computations, it can’t actually think.” Then when the computer can do that thing, the goalposts move, and the thing that used to be regarded as “thinking” gets called “mere computation.” And so on.
I’m glad that you recognized that this is circular. That’s my point. Break that circle, and you don’t have writing. You have something, to be sure, but it’s not writing.
And my point was not, of course, that we shall never develop machines that can read. I happen to think that we shall, in fact, do that. Many years ago, I designed a system for producing folktales based on Vladimir Propp’s formalist description of the genre, which lends itself to automation, to the creation of an algorithm. My point, was, of course, that we haven’t produced such a machine. We have produced software that correlates certain features of pieces of writing for standardized testing prompts to outcomes assigned by trained graders. Confusing those systems with ones that “read essays” is laughable, but the creators of these systems have, of course, a vested interest in people’s not understanding why that is.
Breaking that circle requires you to make an assumption that’s arbitrary. Not that there’s anything wrong with that, sooner or later we have to make arbitrary assumptions. As long as we know when we’re doing it.
Agreed, this is just Xerox marketing. This isn’t a quandary we have to face yet.
Everyone should understand that if machines cannot grade essays and short answer questions the whole Common Core thing will fall apart. Implicit in the whole concept of CCSS is that machines will grade a majority of the things students do. If you read carefully the materials put out by PARCC and Smarter Balanced you’ll see references to the extensive use of computer grading to keep costs manageable.
I often tell my students there are times you have to write from their hearts. Last time I was deep in the bowels of our copy machine clearing yet another jam, I didn’t see a heart or a brain.
rratto–we missed you! Especially as you always make insightful comments, such as the one tonight. Welcome back!
As we confront this threat, I think it would be best to properly set the bar. The question should not be, “Can machines grade writing?” It should be, “Can students improve their writing by simply responding to machine given grades?” The answer to the latter question is almost certainly no. To improve, students will need the guidance of a human being, one who will need to read the student’s writing to offer meaningful help. Once we acknowledge this, though, it becomes obvious that machines do not save educators (educators who are genuinely interested in training skilled and subtle writers) any time. The time consuming part of grading writing is not in the assigning of the grade; it is in the careful reading and commenting. Machines thus add no real efficiency even if (implausibly) they can churn out grades that roughly match the ones a trained human being would give. There is no educational problem a machine grader solves even if it ‘works.’ At best it solves a business plan problem.
Writing has many technical aspects that can be assessed; however, good writing is truly an art form. When done properly, it can touch not just the mind, but the heart and soul. Machines will never replace that.
What about the first draft?
The “heart and soul” are to artificial intelligence as “God” is to science. God is the ever-shrinking space of our ignorance (if we don’t understand what makes something happen, that means God made it happen). The soul is the ever-shrinking gap between what men and machines can do.
Interesting FLERP, interesting. I’m glad I’ll be dead before mankind meets its doom from making a “thinking” machine. Once that happens mankind won’t stand a chance.
First statement… writing has many technical aspects that can be assessed.
The “heart and soul” are to artificial intelligence as “God” is to science. God is the ever-shrinking space of our ignorance (if we don’t understand what makes something happen, that means God made it happen). The soul is the ever-shrinking gap between what men and machines can do.
We do not know, FLERP, that we shall be able to produce sentient machines. It seems likely, given our current understandings, that this is so, but if history teaches us anything, it is that current understandings often prove to be wrong. I’ve noticed in recent reading a wide-spread tendency to overstate what we “now know scientifically.” This sort of certainty mistakes science, which is AT ITS HEART, provisional. There is good reason to believe that there are causal aspects of nature involved in conscious experience to which we have no access now, given our current technologies (those prostheses that extend our access). There is good reason to believe that we are as clueless about essential causal forces involved in conscious experience as Homo heidelbergensis was about the existence of gamma rays. And those reasons have nothing to do with mysticism but, rather, with the simple observation that our access to nature is limited by our perceptual and cognitive apparatus. Consider a tick. It cannot see. It cannot hear. It can detect (“smell”) butyric acid. It can detect temperatures in a narrow range around 35 degrees centigrade. It has some tactile senses. But most of the universes that we perceive does not exist in the universe to which the tick has access. Well, we are like ticks. We are great apes with a particular perceptual and cognitive access, which we keep extending via technological prostheses, but it’s silly to think that we great apes, with minds beautifully adapted to life on the savannah, have all the necessary access to solve, at present, or even in the near term, all the problems that present themselves, such as the problem of accounting for sentience, much less creating sentiences. That I believe this does not make me think that artificial sentience is impossible. It does make me suspect that it’s a problem not solvable with current understandings. You remember, I assume, the possibly apocryphal story about Lord Kelvin giving the lecture to the Royal Society in 1905 about how the future of physics was all about pushing calculations out a few decimal places. At the very time when the lecture was being delivered, Einstein was writing the papers on the photoelectric effect, on brownian motion, and on special relativity that would completely alter the science. I suspect that there are aspects of consciousness that we know nothing about, today, because we don’t have the access to nature that is necessary for such understanding. And I also suspect that this is not a solvable problem in our current form. Read Donald Hoffman. He’s a cognitive psychologist who studies perception. Doing so will disabuse you of your certainties about how shrunken that gap between us and our machines is.
I have no earthly idea how large the gap is, much less what’s in the gap. The only thing I’m certain of is that it’s getting smaller.
This also connects to a blog entry by Dianne from several days ago about the disappearing university… what was missing with the online learning was the human interaction and exchange of ideas. Learning to write can be joyful when you know the teacher is reading what you write and really HEARING you. When there’s even a brief response on the page, there’s something that’s deeply satisfying about the interaction that’s irreplaceable. I think these issues are strongly linked.
Has anyone looked at the appalling writing topics that are being given to students on these standardized exams? They have to choose topics that ANY kid can write about, and so the topics tend to be extraordinarily vague and dull. Write about a time when you met a challenge. Write about what you like to do after school. That kind of crap.
Good writing is idiosyncratic. It’s an act of communication of something significant, something complex and particular, to a particular audience. Good writing requires a lot of knowledge and a lot of lived experience. Good writing isn’t, typically, done off the top of one’s head about nothing in particular of any significance to anyone. And it isn’t done for some vague audience (or, god forbid, for some machine) but for a particular audience addressed in a particular way.
No one who understands anything at all about writing could possibly think that these writing prompts on these standardized tests are good methods for testing writing ability.
Except the writing test prompt last year for 8th graders in Utah that was extremely biased against class. It asked for students to discuss if schools should replace textbooks with tablets. The problem was, the prompt didn’t define what a tablet was. No sweat for wealthier kids, but a HUGE problem for our ELL and poorer kids, who had no idea WHAT the question was asking, and just had to write a guess. Ridiculous!
Yup. A quarter of all American children under 18 live in poverty, according to recent stats. About 40% of all black children under 18 and 35% of all Hispanic children live in homes below the poverty line. About a fourth of American schoolkids have no computer at home. About a third have no Internet access at home. As of March of 2010, only 22% of girls had cell phones and 18% of boys. As of this last March, 47 million Americans and 22 million households were receiving food stamps.
LA Purchase: Oops! Yet another Pear$on error, or a different company? Please let us know!
Teaching is cultural transmission. So is writing. Both are interchanges between individuals, and for the most part, the more idiosyncratic and individualized these interchanges are, the more the participants, teacher and student, writer and reader, get out of them.
One of the first things that I would teach any writing student is “Don’t write about vague, general crap like “meeting challenges” and “fun things to do after school.” The whole approach to writing taken by the designers of standardized tests is antithetical to the production of readable writing, and I am frankly horrified that the writing teachers of the country haven’t, long ago, said to the test makers, “You can’t be serious. You want people to write about that kind of crap? This flies in the face of everything that I try to teach to my students about writing well.”
One can’t write well without having something significant to say about the topic, and the creators writing prompts on standardized tests (and before them, those who designed the testing system and instruments) don’t seem to have understood that. I’ve seen dozens of lessons and books for teaching students how to write research papers that begin with “Choose an narrow a topic.” But that’s nonsense, of course. One has to do a LOT of learning, a lot of gathering information, before one can begin to see what a good topic might be. The topic and thesis should grow out of one’s learning about some subject of interest. The textbooks commonly get this precisely backward. (I used to tell my students that a good rule of thumb is to spent 80% of one’s time when doing a research paper learning about something and deciding what the thesis to be argued will be.) Well, the standardized testing folks make the same mistake. They fail to recognize the central role of having something specific that one has learned and thought about carefully to write about. That’s 80% of the battle. Unfortunately, the testers of writing ability don’t understand writing. They seem to think that it’s about being able to produce a string of sentences in a five-paragraph theme format.
And so it is darkly amusing that the testers of writing ability don’t understand anything at all about what they are testing.
Writing that is compelling to a reader will almost always be work that the writer felt INTRINSICALLY compelled to produce. Given what I know, I HAVE to say this. And to this audience. The writer should be like Coleridge’s Ancient Mariner, with no choice but to get whatever it is off his or her chest. This is true for almost every kind of writing, from the writing of a mathematical proof to the writing of a short story or novel.
…to the writing of a blog post? Your writing is convincing and compelling.
thank you
Agreed on all points, but I also would add something far more sinister: The transformation of education from an activity focused on the development of our intellects–the core of our humanness and therefore humanity–into a normalizing of our behavior according to our wanna-be corporate masters.
The goal of writing is to express one’s thoughts in a way that enables others (i.e., humans) to at least grasp, if not intimately “share”, those thoughts. Writing, like speech, is a fundamental human activity that is critical for civilized societies to exist. To teach writing, then, is to help a student develop their skills to understand, organize, and express their thoughts. Traditionally, this was done by teaching the Trivium (logic, grammar, and rhetoric) to enable students to think clearly, express themselves accurately, and argue convincingly.
Since the goal of writing is to extend one’s internal understanding to another through written expression, students need skilled and experienced human colleagues to hone their skills. At best, a computer can only offer a program that executes some sort of pattern analysis and scoring according to a correlative model that is based on some one’s (or group’s) view of what “good” writing looks like. In other words, since computers don’t “think” (a minor point that seems lost on most journalists, business types, and teachers), the computer can only try to find correlative similarities between submitted writings and some aggregate model of writing. To the computer, the submitted writing is just a string of encoded electoral impulses that are manipulated according to some algorithm to produce an encoded output that is displayed in a form that we can perceive. But to call that “understanding” is a travesty.
In fact that defeats the very purpose of writing, since there can be no hope of establishing an understanding–and hence communication–between the writer and the reviewer. Computers don’t understand anything; hence any output from the computer can’t convey any understanding of the subject essay. The teacher who can only read the computer’s assessment thus can’t understand the writing. How then, can this serve the purpose of training and coaching good writers?
Some argue that computers eliminate bias in grading. The issue in grading fairness isn’t bias, it’s prejudice. Yes, we all have biases; so what? Good teachers, like all good readers and writers, understand their biases and consciously keep them in check. There are graders out there who are fair, and that’s the goal. And since writing is about communication between human beings, and not some rule-based, formalized, abstract exercise like mathematics or symbolic logic, fairness is what matters.
Worse, who says the computer (really the underlying algorithm) isn’t biased itself? The algorithm has to be based on a model of “good” writing. But machines can’t judge writing; so, the model has to be based on “biased” human beings! And bringing in extraneous information about social sites, credit scores, etc. only shows that the bias of the computer will be far more extensive than any human teacher.
So, if the idea of computer grading actually defeats the purpose of writing, and if the very programs that do the “grading” are likely to be at least as biased if not more biased than human graders, then what’s the point? Well, profits for companies that make the software are important. But there’s much more at stake. Replacing writing teachers is just another nail in the coffin of humanities education. Education will become still more like technical training. Worse, as you noted Cathy, by forcing students to “write to the model”, the owners of the code, like Bill Gates and Rupert Murdoch, will actively mold the mind of the young to their own models of humanity. And we’re talking about some pretty withered human beings here.
Time for home schooling.
“The issue in grading fairness isn’t bias, it’s prejudice. Yes, we all have biases; so what? Good teachers, like all good readers and writers, understand their biases and consciously keep them in check. There are graders out there who are fair, and that’s the goal.”
Grading and grades are abhorrant educational malpractices. Sorting and separating students should be an anathema to public education. Assessing a students work in conjunction with the student should be the goal.
there are different reasons for driving a car. If all you want to do is get from point A to point B them by all means perhaps a self driven car may work for you. Better yet take public transportation.
However, if you enjoy driving, and relish the sense of freedom and movement as well as the sounds of the engine as you manually work through the gears, as you downshift as needed, then sitting back while some machine transport you will be torture.
Machines don’t have hearts and souls. Algorithms can not measure what they can not feel.
We must all think the same. We must all write the same. Just answer “Yes” when told and “No” when told. We must all read the same things. Just keep saying these things over and over. History will repeat itself. “Farenheit 451” Here we come.
[…] From Diane Ravitch: […]