Sep 3, 2024 - Technology

Teachers still can't trust AI text checkers

Megan Morrone

Illustration of a pencil with binary engraved on it — Illustration: Natalie Peeples/Axios

As kids of all ages head back to school, educators are still struggling to spot students who are letting chatbots write their reports for them.

The big picture: Commercial AI text detection tools — even those claiming high accuracy — still have some big flaws.

Catch up quick: After the release of ChatGPT, teachers quickly realized that the plagiarism detection software they'd used before failed to work on student submissions that were generated by an AI system.

Academics, startups and even OpenAI itself began releasing genAI text detectors, but none of those tools were very effective either.
And the problem has gotten worse.
"As the technology to detect machine-generated text advances, so does the technology used to evade detectors," says University of Pennsylvania computer and information science professor Chris Callison-Burch. "It's an arms race."

Driving the news: Callison-Burch and a team of researchers created a system for benchmarking the tools that claim to detect machine-generated text and found that many of the claims made by text detectors are "too good to be true."

Using a tool they called RAID, Callison-Burch and his team found that current detectors don't work as well as they claim, and can easily be fooled.
Aside from the checkers not flagging AI-generated text, the researchers found many of the tools flagged content that was actually written by a human.

It's a conundrum, Callison-Burch told Axios, because as a professor he doesn't want to falsely accuse any student of cheating with ChatGPT.

But he found that a low false positive rate for a text checker also means the tool is less accurate at being able to spot AI-generated text.

Between the lines: Earlier this year, Google announced a new technique for watermarking text so that it can later be identified as AI-generated, but there hasn't been an update to the tool since then. Google did not respond to a request for comment.

Callison-Burch thinks watermarking is an "excellent idea," but it's an insufficient tool against student plagiarism since it requires widespread adoption by AI companies.
Sophisticated users could also download open-source AI software that will let them generate text without watermarks, Callison-Burch told Axios.
OpenAI has also developed a text watermarking method, but has not released it yet. A spokesperson told Axios its tool is "technically promising," but also has "important risks" that the company is weighing while researching alternatives.

State of play: As teachers start their third year fearing ChatGPT-generated text, many are rethinking their genAI abstinence polices.

The popular AI writing assistant Grammarly is trying to solve the cheating problem by making it easier to disclose the use of genAI in writing.

The company says it's launching a beta of a tool called Authorship for all Grammarly customers later this month. But don't call it an AI text detector, Authorship's product marketing manager, Cliff Archey, told Axios.
Instead, Authorship allows students to "show their work" as math teachers have been asking for since the dawn of the calculator.
The tool labels sections of a document that were typed by a user and those that were cut and pasted from ChatGPT, other chatbots or other sources.

What's next: Machine learning is going to get better — not worse — at generating text that's indistinguishable from what a human can write.

This means educators will need to evolve the way they teach kids how to write and think, just as they have since the invention of spell check, the internet and Wikipedia.
"Our view about writing in the current era we're living in is that everything is contextual," Archey told Axios.
"In a first-year creative writing course, there's an argument that generative AI should not be used really at all, beyond potentially just helping with the brainstorming phase," says Archey. But in a business school communications class, for example, "there could be very good reasons why you would have generative AI folded in."
Callison-Burch agrees that disclosure of genAI is a good middle path. But "what's the level at which you have you should have to disclose?" he asks. "I think that's still in question."

Add Axios on Google

Teachers still can't trust AI text checkers

What to read next