AI Detectors: No, the solution to issues created by AI is not more AI

In 1950, Alan Turing, English mathematician and pioneering computer scientist, proposed the parameters to his “Imitation Game”, more commonly known as the Turing Test, in his paper “Computing Machinery and Intelligence”. The premise is as follows: a human evaluator exchanges a text conversation with two speakers, one human and the other a machine. The evaluator is then responsible to guess which is which. To date, no machine has passed the test with 100%, but last year, ChatGPT passed with 60%, one of the best results ever produced.

Cue the pandemonium. It is the death of the essay – students will no longer write.

I will propose however, as many have before me, that the traditional essay as we know it, has been dead and buried and decomposing in the depths of academia for a good while now. Students who had the propensity to cheat were not really waiting for AI to start cheating.

Recently, I have been getting an ad every few minutes while scrolling on any social media platform for a new AI “assignment assistant”. Earlier, in 2022, there used to be ads for paper mills and “tutors”. AI did nothing but democratize access a little more. And it is doing it at a speed that academia, with all of its bureaucracy and rites and historically proven slow adaptability to market changes, has been struggling to keep up with.

The speed of change is unprecedented (tech is developing at an exponential rate; the speed of change is accelerating, so being ‘unprecedented’ at this point is a given), and there is no digest you can wait for at the end of every week that will tell you how to navigate it. Between the time I began to write this piece in mid-May and its publishing this week, and among a host of other AI releases, ChatGPT unveiled its GPT4o, a much more advanced model than its predecessors which we haven’t caught up with yet.

Here is another bitter truth, AI detectors are just not reliable.

It is like playing a game of whack-a-mole with a few seconds of delay.

Here is how I always test the new detectors that crop up:

I keep three pieces of writing at hand: One is fully written by me, one is fully AI generated, and the third is AI generated then “humanized” (paraphrased) by bypass.hix.ai. Most good detectors will correctly judge the first two pieces, but more often than not, fail to judge the third “humanized” piece.

What this means is that it is producing false negatives and that the students caught using AI are often the ones just not savvy enough to manipulate it. If that is the case, then it is simply not fair to students for detectors to be used. While I personally think that a false negative is a false result and an indication enough that such tools are unreliable and their use would be unfair, there are other educators who put more weight on false positives and the fact that their rate is usually less than false negatives bolsters their trust on relying on AI detectors as indicators of AI use.

In the case of false positives, I do admit that in my personal testing, they are more arbitrary and less consistent than false negatives; however, they do exist. On the 16th of March, 2023, Turnitin released a statement emphasizing that the false positive rates for their AI detector are quite small, and that their “efforts have primarily been on ensuring a high accuracy rate accompanied by a less than 1% false positive rate”. A week later, on the 23rd of March, they released another statement admitting that their false positive rate is higher, exhibiting a ~4% rate on the sentence level for essays with less than 20% AI use.

Let’s run the numbers of their advertised best case scenario: If there is a 1% false positive rate, this means that of the “800,000 academic writing samples that were written before the release of ChatGPT” that Turnitin used to test their detector, 8000 samples showed false positives.

This is when we need to remember that while Turnitin is a corporation selling a product and deals with statistics as just that, in a university, these numbers denote actual students. There are around 6000 students at AUC, that is 60 students showing false positives if each student submits one essay to Turnitin. Meaning, 60 students who had not used AI in an unauthorized manner might be flagged as having used AI by the tool. These would be 60 honest students wrongly accused of being dishonest.

Another study published by Stanford showed that non-native speaker writing is more likely to be flagged as AI. While Originality.AI, a software that advertises itself as the “the most accurate AI content detector producing the fewest false positives”, published a rebuttal, they still maintain that “It’s important to note the differences in false positives in terms of marketing versus academic use. We have repeatedly emphasized that Originality.AI is not for academic use.”

Currently, Vanderbilt University, Michigan State University, Oregon State University, University of Pittsburgh, and University of Texas at Austin, among others, have issued statements about disabling/not endorsing AI detecting software. University of Texas at Austin adds “Centering a required process of drafting, feedback, and revision, as all Writing Flag (similar to RHET at AUC) classes should do, protects against the possibility that AI-generated text will be submitted as a student’s own work. In addition, emphasizing learning objectives focused on developing skills and practices as writers can reduce the incentives to use AI-generated work. The incentives to plagiarize and cheat (with or without AI) increase when students perceive that performance in a final submission is the only thing being rewarded in grading.”

What can be done then?

Different disciplines and schools of thought have been dealing with the problem in different ways. Some advocate the return to pen and paper, which in my personal opinion have been unjustly stigmatized in our push for modernization and digitization. Others propose integrating AI into syllabi and assessments, wherever you lie on the spectrum of allowing or disallowing AI use (find CLT’s curated resource here). CLT has a dedicated webpage  with guidelines on the use of AI detectors that can help you navigate the new landscape.

There is no one correct answer, and none of the proposed ones are easy – change never is. But more than ever, instructors need to be proactive in keeping up with how this new technology will impact their fields of study – whether positively or negatively. In ten years, we’ll probably have a clearer idea of the effects AI will have on education, but for now, no one can definitively tell you what to do to come out of the other end unscathed.

I can, however, tell you what not to do: don’t fight AI with more AI, and instead focus on humanizing your class.

If you want to dig deeper:

+ posts

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: