OpenAI Has Kept Secret an Accurate ChatGPT Detector for Two Years

. August 5, 2024 at 9:26 am 12

According to The Wall Street Journal, “OpenAI has a method to reliably detect when someone uses ChatGPT to write an essay or research paper” but hasn’t released it yet, despite concerns about widespread student cheating on assignments with it, as well as other illicit uses.

–

The WSJ writes:

The project has been mired in internal debate at OpenAI for roughly two years and has been ready to be released for about a year, according to people familiar with the matter and internal documents viewed by The Wall Street Journal. “It’s just a matter of pressing a button,” one of the people said.

Why the delay?

In trying to decide what to do, OpenAI employees have wavered between the startup’s stated commitment to transparency and their desire to attract and retain users.

The detection technology is a watermark system that is reportedly 99.9% effective. Here’s how it works:

ChatGPT is powered by an AI system that predicts what word or word fragment, known as a token, should come next in a sentence. The anticheating tool under discussion at OpenAI would slightly change how the tokens are selected. Those changes would leave a pattern called a watermark. The watermarks would be unnoticeable to the human eye but could be found with OpenAI’s detection technology. The detector provides a score of how likely the entire document or a portion of it was written by ChatGPT. The watermarks are 99.9% effective when enough new text is created by ChatGPT, according to the internal documents.

Apparently a version of the watermarking technology has been developed by Google that can detect text generated by its LLM,
Gemini AI (the detection technology “is in beta testing and isn’t widely available”).

The detection technology does not appear to negatively affect the quality of the text generated by LLMs (which was one concern voiced by opponents of releasing the technology). OpenAI’s own survey data shows that people would prefer the technology to be released by a margin of four to one. What has kept it under wraps, it seems, are worries about how it will affect OpenAI’s profitability: “nearly 30% [of surveyed ChatGPT users] said they would use ChatGPT less if it deployed watermarks and a rival didn’t.” Are there reasons to think this is an especially intractable collection action problem?

The WSJ report, paywalled, is here.

(via Ian Olasov)

12 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Jason Kay

1 day ago

I hope that the government compels AI companies to develop and make publicly available such watermark-detection software. After all, there’s a very clear precedent for this.

Major printer companies like Canon and Xerox have voluntarily included invisible watermarks for decades on printed material. These watermarks–usually patterns of extremely small yellow dots–often encode the date and time of printing, the model of printer used, etc. Similar patterns of yellow dots are used on currency to prevent counterfeiting.

Fun fact, most printers refuse to print images in which they recognize the distinctive pattern of yellow dots governments place on their bills.

https://en.wikipedia.org/wiki/Printer_tracking_dots
https://en.wikipedia.org/wiki/EURion_constellation

I suppose that Canon and Xerox could adopt OpenAI’s brilliant argument here. Unabombers, terrorists, and blackmailers would be less likely to use their printers to print manifestos, ransom notes, and the like, if they include digital watermarks on their material. I mean, do we really want government agencies to be able to verify whether some video of Obama is genuine? That could seriously hurt profits!

Michel

1 day ago

I would post something, but alas I have a pile of robot essays to mark.

Tim

1 day ago

A question for those more informed than I: Going forward, am I correct in saying that, at this there are so many other competitors that it won’t matter for too long, since students can use another model which does not include the watermarking?

CDKG

Reply to Tim

1 day ago

Yes, this is basically correct as far as I know. Even if there is a successful regulatory push for labs to include watermarks in the outputs of their models, the fact that many models are being open-sourced means that people will always be able to fine-tune them relatively easily to remove the watermarking.

Ian

Reply to CDKG

1 day ago

while this is basically correct, there are two issues.

first, if openai begins to watermark ChatGPT output, the other *major* ai companies (and there are only really a few) will probably do as well sooner or later.

second, at this point i’m not really worried about the “fine-tuning” issue. i am curious if anyone here has any experience with students fine-tuning their ai papers. the question could be seen as silly—the fine-tuned papers are the ones we’re a lot less likely to flag as ai generated, at least putatively—but it’s been my experience that the ai stuff is painfully and obviously clear. like it’s not even a question.

it follows, or likely follows, that–broadly-speaking–cheating students don’t bother to fine-tune ai material. whether that’s laziness or ignorance of how to do so is worth looking into, but at a certain point the fine-tuning ends up taking almost the same amount of time as writing the damn paper oneself would.

if the equation is minimizing effort while maximizing output—as I think it is for ai cheating—it remains in doubt whether widespread fine-tuning will occur.

An adjunct

1 day ago

daily nous is serving chatgpt ads right this second.

Simon

17 hours ago

I think it is worth distinguishing two different groups of users who don’t want watermarks. One group is students who are cheating. Another group is business users who are generating value with the product, but in an environment where stigma against AI outputs would undermine the value. So think of someone writing a government report that is just as good (if not better) for the AI that was used, but if it was public knowledge that the report was generated using AI in part, the report would be rejected. I think there is social value in stopping the cheaters; but there is social disvalue in stopping business users from using it, because it will make workers less productive. I think it may be hard to evaluate whether the benefits of watermarking outweigh the costs.

Last edited 17 hours ago by Simon

Jason Kay

Reply to Simon

10 hours ago

You’re right, but I think there’s no social disvalue in watermarking AI-generated material. Besides the students using AI to cheat, there is a far larger group of users attempting to pass off AI-generated content as their own, whether it be some visual art, a short story, a piece of music, or whatever. These users are just aware as you are that there is a premium for humanly created work you mention, and seek to profit from that by misrepresenting the origin of their work.

If watermarks take this premium away, that’s not a bad thing. People passing off AI-generated art as their own ought not to get that premium in the first place. Today, many sellers on Etsy and similar websites are in the business of generating AI art and then incorporating it into the items they sell: mousepads, posters, t-shirts, stickers, and the like. That’s a legitimate line of work. But it should be illegal to misrepresent one’s product in this way, whether we’re talking about AI or any other industry.

christine

8 hours ago

That is a touch of not the most entertaining way of doing things, but one can generate their content, then use a tool to humanize, and ensure it goes undetected.

Jason Kay

Reply to christine

5 hours ago

Even if that’s true, watermarking policies would still decrease cheating behavior by attaching additional costs to cheating. Rather than talk to chatGPT for 30 seconds, a student might need to spend their afternoon adding grammatical mistakes, and these might not be believably human. (Just as humans can’t credibly produce strings of random numbers, I suspect that students cannot reliably produce the errors, mistakes, and misunderstandings which honest students actually make.)

Justin Fisher

3 hours ago

The “99.9% effective” claim should be taken with a huge grain of salt (and you should probably be admonished for headlining it). As reported this is “99.9% effective when enough new text is created by ChatGPT” which basically means that, if you generate a huge enough volume of text, perhaps many books’ worth for all we know, then there will be statistical patterns in the words that would have been extremely unlikely to occur via other ways of producing text, among some set of alternatives that they considered. That by itself doesn’t show that this would be nearly so effective at detecting fakes that are the length of student papers. It also doesn’t tell us anything about how effective it would be at detecting text that had been lightly edited/disrupted either by students themselves or by third-party “dewatermarking” apps that would surely flourish if Chat-GPT ever did roll this out. At best here, we have the claim that huge volumes of unaltered Chat-GPT output can be statistically recognized as (likely) being that, but it doesn’t tell us anything about how hard it would be to recognize smaller volumes of lightly-altered Chat-GPT output which is what our students would be likely to submit.

Nate Sharadin

1 hour ago

Claims to have built useful versions of LLM-generated text detectors continue to be bullshit in one of several ways. They either don’t work (like the first round of text-based watermarks), and so are useless, or (like the most recent round just announced) can be defeated simply by asking another capable model to paraphrase, and so are also useless. And in any case, it’s a bait and switch: rather than producing tools that can definitively say that some sample of text isn’t produced by *a* model, large model developers can as a technical matter only ever give us tools that can say that a sample of text isn’t produced by *their* model. And while that might protect them from liability, it doesn’t help with any actual problems.

	This comment is spam
	This comment includes a personal attack
	This comment disparages people based on demographic qualities (e.g., it is racist, sexist, homophobic, etc.)
	This comment otherwise violates the comments policy
	Other