Automating Reading and Writing: Computerized Essay Grading

The past few weeks have seen a lot of discussion over computerized essay-grading. Some people admire its labor-saving potential (because who really likes grading a huge pile of student essays?) while the louder crowd argues that, among other things, a computer can’t read. I see this discussion as part of a broader trend that stretches back into the early days of the modern computer.

The Recent Discussion

To give an overview, Can a Computer Grade an Essay? is a 30-minute Radio Boston episode about computer-graded essays and introduces the major sides of the issue. Professionals Against Machine Scoring of Student Essays in High-Stakes Assessment and Professors Angry over Essays Marked by Computer represent, as their titles suggest, the anti-computer grading faction. From the latter:

The group’s petition [against computerized grading] says: ‘Let’s face the realities of automatic essay scoring. Computers cannot ‘read.’ They cannot measure the essentials of communication; accuracy, reasoning, adequacy of evidence, good sense, ethical stance, convincing argument, meaningful organization, clarity and veracity, among others.’

Here is an article from The New York Times that takes a much different attitude towards computerized grading by addressing its labor-saving potential.

These Discussions Aren’t New, But…

John Glenn "piloting" the Mercury capsule.

John Glenn “piloting” the Mercury capsule. Image courtesy of the Ohio State University John Glenn Audio Visual Collection.

This issue has been of particular interest to me due to my dissertation research in American studies that I’ve been writing about representations of automation in the U.S. in the mid-twentieth century. Machines represented as taking on characteristics of human thought, like computers, tend to cause different sorts of anxiety over labor than humanoid machines, like robots.

Mechanized thought challenges our notion of what it means to be human, since our cognitive abilities are what we see as separating us from other animals. An example: In the late 1950s and early 1960s, development of NASA computer systems which could calculate the trajectory of an object through space led to automated spacecraft that could appear to fly themselves. The first “manned” space program, Project Mercury, had spacecraft that could fly without human occupants. This led NASA officials, the American public, and the Mercury 7 astronauts themselves wonder, what exactly was the point of a Mercury capsule “pilot” if it was a computer doing the piloting?

As computer scientists developed computers that could perform more and more complex tasks that seemed to mimic the capabilities of the human brain, the way that we define what makes us human changed also. The idea of a computer grading an essay must seem as far-fetched and untenable as an automated spacecraft once seemed. We surround ourselves with arguments meant to define what it is that humans can do that the computer cannot, just as people at NASA once did. As John Glenn, the first American to orbit earth, described in a Life magazine article after his historic flight in February of 1962, “Man seems to be the best computer you have there in the capsule … We can plug man into the system and make him part of the system we rely on.” For Glenn, the human becomes the machine.

But Computers Can’t Read!

For John Glenn, the astronaut’s job aboard the Mercury capsule demanded a comparison between “man” and machine, and he chose to liken himself to a machine. The opponents of computerized grading, when faced with the same comparison, choose not to do this. Computers could not possibly be able to do what humans who grade essays can do. The reason for this, I think, is that one can always argue that evaluating language, and student essays, is a “subjective” task — whereas the trajectory and fuel usage of a rocket or an engine is not. Glenn can be the “best computer” aboard the Mercury capsule, but something about the act of writing resists our notion of the cold, calculating computer.

For example, those against computerized essay grading like to bring up the fact that if you put James Joyce’s novel Ulysses into the essay grader, it would not receive a very good score. One article cited that the Gettysburg Address, when put into the essay grader, only scored a two or three out of six. I don’t need to tell you that students, for the most part, aren’t handing in Gettysburg Addresses when they turn in an essay they wrote for class. Even if they are phenomenal writers, the essay assignments that I give would never be so open-ended to get me something like Ulysses. If someone handed in something that read like Ulysses, they probably wouldn’t get a very good score from me, either.

I’m only being a little facetious here, mostly because students turn in papers that are highly formulaic. This is not anything on them either, since I create assignments that ask students to produce a certain type of document, so that I can grade it in a certain type of way. I usually use a rubric to grade student essays. I give this rubric to them along with their assignment, so that they know on what criteria I’d be grading them on. When I’ve assigned more “creative” writing projects to students, the papers they turn in immediately become very difficult to grade. Why? Because they no longer follow as strict of a formula, and assessment becomes difficult, if not impossible.

Assessing a formulaic student-written essay is not the same as appreciating “art” or works otherwise considered to have literary or historical merit. So, Gettysburg Address and Ulysses, don’t worry, we know that you’re special beyond what a computer thinks.

Academic Writing Is Formulaic (i.e. “Mechanical”)

To that end, I’ll suggest that writing, especially for an academic audience, is formulaic, and has rules and guidelines that, if followed, can and do produce “good” writing. Evaluating writing at the college level is not entirely subjective, just as writing it isn’t. I think that a computer could get to be pretty good at grading an essay, and I know that I sometimes feel a bit mechanized from going through a pile of student essays.

Students might very well learn how to write for the computer program, but as it is now, students write for their professors. Students know that faculty have different grading styles, and that while one professor might lean more heavily towards the formation of a strong thesis statement, another might have more of an eye for grammar and style. Students adapt their writing in ways that will get them the best grades from a particular professor.

But What About Our Jobs?

Of course, there are other considerations here beyond just the idea of whether or not a computer could do well in grading student essays — a number of folks opposed to computerized grading seem worried about jobs. (The existence of MOOCs also feeds that anxiety.) What if colleges start replacing people who assess student work with computers? This isn’t a problem with technology, it’s a systemic problem that already goes deeper than whether or not computerized essay grading will happen.

We all have heard that low-paid adjunct labor makes up a large percentage of courses taught across the U.S., and that graduate students and other hired help tend to do a lot of the grading, especially at larger schools. In many places, the labor of grading is already shifted away from professors, and in the places where it hasn’t been, wouldn’t you like to spend less time grading and more time developing course materials and assignments that could help students learn better? I know I would.

I, for one, welcome our essay-grading computer overlords!

About Kim Mann

Kim Mann is the editor and a writer for the Academic Technology Blog. She earned her BA in English from the University of Minnesota in 2003 and her MA in American Studies from William & Mary in 2009, and her PhD in American Studies at the College in 2014. Her research is on technology, the interface, and the body in mid-twentieth century science fiction.