Correcting the Record

This year marks my fifteenth year writing this blog. Since the beginning days of this space, I’ve tried to maintain a few core principles about the things I write (and have written). While I won’t get into all of those principles in this post, the one that I’ve honored since the start is “Don’t rewrite history.” When I write a post, it captures my thoughts, emotions, and working environment at that moment. Every post represents a snapshot in time. If I revisit a post a year or two down the road, I avoid rewriting or editing the post. Sure, I may correct a spelling or grammar mistake that I may have missed earlier, but I won’t rewrite the content or change the meaning or intent of the original post. By adopting this principle, I wanted to honor my past self (and my thinking at the time) and give my future self something to look back on. Through fifteen years of blog posts, I can’t remember revising the intent or meaning of a single post.

Today, I’ve decided to revise two posts from the past. Obviously, I’m doing this after great reflection, but I believe it is warranted. Looking through a few old posts on ChatGPT, and other generative artificial intelligence (GenAI) tools, I made claims about GenAI detectors that I’d like to revise. I don’t want people coming upon one of the past posts and having it inform their practice. So, I’ve added disclaimers on a couple old posts. Here’s the issue.

I’ve stopped using AI detectors because I don’t trust them. I’ve read a bunch of reports that say the detectors are giving false positives, which means they falsely identify a person’s writing as being AI-generated. While this is concerning in itself, more concerning is the fact that the tools more often flag writing from non-native English writers as being AI-generated. Take this study from researchers at Stanford University. The researchers collected 91 human-written essays from a Test of English as a Foreign Language (TOEFL) repository. They report:

“(The detectors) incorrectly labeled more than half of the TOEFL essays as “AI- generated” (average false-positive rate: 61.3%). All detectors unanimously identified 19.8% of the human-written TOEFL essays as AI authored, and at least one detector flagged 97.8% of TOEFL essays as AI generated. Upon closer inspection, the unanimously identified TOEFL essays exhibited significantly lower text perplexity. ” (Liang et al, 2023, pg. 1).

In response to this, I’ve joined other institutions (like Vanderbilt University) to advise my colleagues to refrain from using AI detectors to examine student writing for AI-generated text. I’ve also added disclaimers on any previous posts which suggests that AI detectors can be used in this way. It’s a deviation from my past practices and my core principles of this blog, but I believe it is warranted.

The next logical question that some folks may have is “What do I do if I suspect a student is presenting AI-generated text as their work? If I can’t use an AI detector, what can I do?”

I’ll dig into that topic in next week’s post.

References:
Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., & Zou, J. (2023). GPT detectors are biased against non-native English writers. Patterns, 4(7).

2 thoughts on “Correcting the Record

  1. Pingback: ChatGPT: A Primer | The 8 Blog

  2. Pingback: Teaching in a ChatGPT World | The 8 Blog

Leave a comment