The AI-Immune Assignment Challenge


AutomatED, a guide for professors about AI and related technology run by philosophy PhD Graham Clay (mentioned in the Heap of Links last month), is running a challenge to professors to submit assignments that they believe are immune to effective cheating by use of large language models.

Clay, who has explored the the AI-cheating problem in some articles at AutomatED, believes that most professors don’t grasp its severity. He recounts some feedback he received from a professor who had read about the problem:

They told me that their solution is to create assignments where students work on successive/iterative drafts, improving each one on the basis of novel instructor feedback.

Iterative drafts seem like a nice solution, at least for those fields where the core assignments are written work like papers. After all, working one-on-one with students in a tutorial setting to build relationships and give them personalized feedback is a proven way to spark strong growth.

The problem, though, is that if the student writes the first draft at home — or, more generally, unsupervised on their computer — then they could use AI tools to plagiarize it. And they could use AI tools to plagiarize the later drafts, too.

When I asserted to my internet interlocutor that they would have to make the drafting process AI-immune, they responded as follows…: Using AI to create iterative drafts would be “a lot of extra work for the students, so I don’t think it’s very likely. And even if they do that, at least they would need to learn to input the suggested changes and concepts like genre, style, organisation, and levels of revision.”…

In my view, this is a perfect example of a professor not grasping the depth of the AI plagiarism problem.

The student just needs to tell the AI tool that their first draft — which they provide to the AI tool, whether the tool created the draft or not — was met with response X from the professor.

In other words, they can give the AI tool all of the information an honest student would have, were they to be working on their second draft. The AI tool can take their description of X, along with their first draft, and create a new draft based on the first that is sensitive to X.

Not much work is required of the student, and they certainly do not need to learn how to input the suggested changes or about the relevant concepts. After all, the AI tools have been trained on countless resources concerning these very concepts and how to create text responsive to them.

This exchange indicates to me that the professor simply has not engaged with recent iterations of generative AI tools with any seriousness.

The challenge asks professors to submit assignments, from which AutomatED will select five to be completed both by LLMs like ChatGPT and by humans. The assignments will be anonymized and then graded by the professor. Check out the details here.

 

Subscribe
Notify of
guest

42 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Marc Champagne
1 year ago

Of course, in all possible worlds, it means more work for us profs. I love progress.

worried
1 year ago

I’ve become convinced there’s no good solution. This term, I asked students to confirm their essay topic with me. A student sent me a detailed outline of her essay topic. I was impressed, said it looked good, and gave a few pointers. She then submitted an essay entirely written by ChatGPT. The outline, I later found out, was also AI generated.

One way to combat this is to give topics or questions that are difficult for ChatGPT to answer (and this will become less relevant as it gets better). In one class, one of the essay prompts was about a particular objection in a paper we read to a prominent theory. A student submitted a ChatGPT essay on this, which wasn’t even close to being correct, because ChatGPT wasn’t familiar with this particular objection. But I still need to grade the essay as a human-written essay since the student doesn’t admit it, and currently there is no way to give conclusive proof.

The only way to avoid this, it seems to me, is to have in-class assignments. There are downsides to that as well, but I’m not sure what else can be done.

V. Alan White
Reply to  worried
1 year ago

During my career I always gave a short in-class writing to begin every class that involved papers. I returned this writing to my students only at the end of the semester (after allowing them to see comments and then having them handed back in–in part to allow them to see at the send of the semester whether their thoughts had changed about the topic). This afforded me a sample of writing that I could then check against papers later to see if the quality of them was more-or-less appropriate to their later formal work. It did help in spotting obvious plagiarism.

Graham Clay
Reply to  worried
1 year ago

I am sorry to hear about your student plagiarizing her outline and essay. I have had several similar issues this semester with my students, too. It is very demoralizing.

The second situation is going to become less common since GPT and other LLMs are accepting larger and larger prompts, so students can simply upload the essay prompt, the paper containing the novel/lesser-known objection, etc. This is an instance of a broader problem: with less limits on prompt size and modality, students can use AI tools to complete more complicated assignments, including iterative assignments and ones that rely on novel course materials.

However, there are a variety of ways to make even take-home written assignments AI-immune. The most general advice I have is for professors to experiment with their own assignments and the publicly available AI tools in order to learn how to make the assignments AI-immune. I discuss this strategy and its justifications at greater length here: https://automated.beehiiv.com/p/conceptualizing-solutions-ai-plagiarism-problem. More specific advice/strategies will come out of the challenge Justin highlights. We have already received many assignment submissions to the challenge that are very creative, and I think we will learn a lot from trying to crack them. I will be writing about the results in the coming weeks and months in AutomatED.

Eric Steinhart
1 year ago

I’m not entirely convinced that there’s a problem. Why not encourage students to use GPT as much as they want, to craft the very best and most interesting writing they can. Of course, they have to explicitly say that they used GPT. GPT is a tool, just like writing. Why (to paraphrase Tyler Cowen) would we object to more intelligence?

cecil burrow
Reply to  Eric Steinhart
1 year ago

OK, so someone writes an essay and tells you they used GPT. You have no idea whether they did 99% of the work and Chat GPT the remaining 1%, or the other way around. The essay is pretty mediocre and misses many obvious points, but it is not awful. What grade does it get?

Eric Steinhart
Reply to  cecil burrow
1 year ago

I don’t know how these things will work out, except that I know that we (professors) aren’t going to stop students from using these technologies. If you want to enter that race, prepare to lose.

Graham Clay
Reply to  Eric Steinhart
1 year ago

Well, it’s not clear that we cannot design assignments that are AI-immune, hence the challenge in the OP. I think there will be quite a few categories/kinds of assignments that are AI-immune, despite being take-home writing assignments. We will have more evidence about whether I am right in the next few weeks, and we will continue to update our readers in the coming months as the AI tools develop. Likewise, I think there are ways of pairing assignments that are not AI-immune with other assignments that are AI-immune that encourage students to complete the former without the help of AI tools (as I discuss here: https://automated.beehiiv.com/p/conceptualizing-solutions-ai-plagiarism-problem).

As for your general point about encouraging students to use GPT as much as they want, I have written a brief rejoinder to this sort of view here, if you’re interested: https://automated.beehiiv.com/p/preventing-ai-plagiarism-leads-skilled-ai-users.

Dustin Locke
Dustin Locke
Reply to  Graham Clay
1 year ago

Eric, why prepare to lose? Supervised writing is an obvious (but admittedly resource intensive) way to win.

Eric Steinhart
Reply to  Dustin Locke
1 year ago

Sure, there are lots of ways – just do oral exams, police their writing. But why? Why would we want to do that? What ends does it serve to remain wedded to the past?

Cecil Burrow
Cecil Burrow
Reply to  Eric Steinhart
1 year ago

To distinguish those actually doing the work from those not actually doing the work?

Eric Steinhart
Reply to  Graham Clay
1 year ago

Yes, I agree that students will need some “traditional” skills to drive AIs; but I think they will get them by driving the AIs, much like chess players learn from AIs today.  

And yes, there are ways to AI-proof assignments: just give oral exams, return to Socratic purity, stop writing.  Or become a tyrant in the classroom, and do draconian policing of their writing.  Why would anybody want to teach like that?  What values does that serve?  

But why would philosophers want to prevent students from using AI to do philosophy?  I fear we are wedded to an old and regressive way of teaching and doing philosophy, at a time when philosophy departments are closing left and right.  Other fields (e.g. math, physics, chemistry, biology, etc.) are making every effort to integrate AI. They have problems they want to solve.  Either we embrace this new future, or we die in it very soon.

I think of the philosopher of the future as a centaur (a human-AI team).  Philosophers using text-analysis AI to study old texts, philosophers coding their theories in Isabelle and Lean.  Philosophers fine-tuning their theories into AIs instead of writing books or articles. The opportunities are endless.

Cecil Burrow
Cecil Burrow
Reply to  Eric Steinhart
1 year ago

> But why would philosophers want to prevent students from using AI to do philosophy?

Because getting chat GPT to write your paper isn’t ‘doing philosophy’.

Eric Steinhart
Reply to  Cecil Burrow
1 year ago

If you’re trying to tell me that LLMs are going to disrupt your old-fashioned way of “doing philosophy”, then I agree: you’re right. So you better figure out a new way. You’re not going to stop this technology.

Justin Kalef
Justin Kalef
Reply to  Eric Steinhart
1 year ago

It would certainly disrupt our teaching if, say, students can just earn grades by bypassing all the thinking and reading and writing we’ve all learned and instead can just copy and paste our prompts into an app and then copy and paste the answer back to us.

If something like that is what will count as a new way of ‘doing philosophy’ merely because people find it ‘draconian’ and ‘old fashioned’ to ensure that students are not cheating in that way, then I find it hard to see why anyone should bother to ‘do philosophy’ in that new way.

It doesn’t seem to teach anything important any longer, on that model. Those who might want to hire someone with the relevant skills would save money by simply pressing the button themselves. This sounds like a wave-the-white-flag option for the profession.

Cecil Burrow
Cecil Burrow
Reply to  Eric Steinhart
1 year ago

No-one is trying to stop the technology; all we are trying to stop is students with no grasp of the material passing the class by generating passable papers in a few seconds with a few mouse clicks. This is easily stopped by going back to traditional in-class exams, which is what I imagine many will end up doing.

Eric Steinhart
Reply to  Cecil Burrow
1 year ago

Old way of doing philosophy: (1) Student reads text. (2) Professor lectures on text. (3) Student writes paper on text. (4) Professor evaluates whether student understood text.

Just stop doing this. Figure out some other way to teach and do philosophy.

David
David
Reply to  Eric Steinhart
1 year ago

>Yes, I agree that students will need some “traditional” skills to drive AIs; but I think they will get them by driving the AIs, much like chess players learn from AIs today. 

Chess players do not learn to play Chess well by ‘driving’ AIs which do the bulk of the playing for them. People do learn to play Chess well by getting *feedback* from Chess AI on their games and by studying AI suggestions in advance of playing. Nothing like the equivalent of students leaning heavily on ChatGPT to do their writing for them is practiced in Chess as a way to get better at Chess to my knowledge. The closest equivalent might be something like AI-human pair tournaments where humans play against each other with the aid of AI evaluation (I believe there was one such tournament in Go a year or so after the advent of AlphaGo).

And this is all with AI that has been trained with clear objective conditions of success (win/loss rates) rather than whatever ChatGPT is trained off of (humanlike utterances? what users click the thumbs up on?).

Eric Steinhart
Reply to  David
1 year ago

So why can’t students use GPT in exactly the ways you said chess players use GPT? They can.

David
David
Reply to  Eric Steinhart
1 year ago

Sure they can*, but that misses the problem we are trying to grapple with: there is no easy way, short of policing or designing AI-immune assignments to prevent students use GPT in *other* ways. That’s the problem.

I could tell students “Please only use ChatGPT to give you feedback on your already completed essays which you yourself wrote,” but if it’s a take-home assignment there is no way with a standard essay to ensure that the student is actually doing this, rather than feeding ChatGPT the prompt to get an essay, then feeding ChatGPT its own essay so it can give feedback to itself.

  • – It’s not equivalently valuable, though. Chess AI is objectively, measurably better than any human in evaluating the game. There is no evidence that ChatGPT is this way, and it has not been trained to track truth the way Chess AI has been trained to track winning.
Eric Steinhart
Reply to  David
1 year ago

I agree – that’s why we need to figure out new ways. The old ways of teaching and assessing philosophy won’t survive these new technologies.

Josh
1 year ago

Does anyone else feel like it’s actually pretty obvious when students are using ChatGPT? My colleagues and I have had a few known cases (confirmed by different online detection mechanisms without exception), and each time it was rather easy to see what was going on. I suppose it’s possible that they’re getting it past us, but as far as I can tell it’s not exactly a secret (the bizarrely formulaic style, the not-quite-on-topic-but-almost ideas, remarkably good grammar, etc.). Maybe this is something that could get trickier as students get more prompt-savvy, but I’d be curious if others have a similar take.

worried
Reply to  Josh
1 year ago

I agree, it’s pretty easy to detect (at least right now), and your description is spot on. One of the problems, though, is what do you do? If the student denies using ChatGPT it seems there’s not much you can do, since the university (at least mine) wouldn’t accept online detection tools or the instructor’s ability to tell as admissible evidence of cheating.

Derick Hughes
Reply to  worried
1 year ago

That’s unfortunate that your university doesn’t accept online detection tools.

Maybe that will change soon. I learned yesterday that Turnitin has AI writing detection now. As someone who is finding a larger than normal amount of ChatGPT responses in an asynchronous online course, I think we should be willing to push administrators to revise their policies. At least, I’ll soon be making that push.

And of course, I have to take a hard look at myself and the assignments I’ve chosen to make. I have to face the terrible irony of this happening in my Ethics and Information Technology class…

Brendan O
Brendan O
Reply to  Josh
1 year ago

I agree. The easiest thing to spot, for me anyway, has been that the writing is too good. Complex ideas deftly expressed. Clean grammar.

But if the LLM’s have, or could have, a setting that ‘dumbs down’ the writing or content, giving a B or C paper, then it would be harder to spot.

Graham Clay
Reply to  Brendan O
1 year ago

As noted above, the “dumbs down” setting is easy to access: students just need to instruct the AI to write in a way that contains errors or is typical of a certain reading level. If you look at YouTube and TikTok tutorials on how to plagiarize with AI tools, they explain how to do this effectively.

Graham Clay
Reply to  Graham Clay
1 year ago

Or I guess I should have said “as noted below.”

Derick Hughes
Reply to  Josh
1 year ago

I agree. It has me a little more worried about the future. I had a student use ChatGPT without me knowing for four or five consecutive assignments. So it had me fooled for a little. That was OK in the end, though, because I felt I had more evidence collected to make a fair judgment.

Josh
Reply to  Derick Hughes
1 year ago

Interesting, okay. Can I ask what sort of assignment this was?

Graham Clay
Reply to  Josh
1 year ago

In short, I think it’s very risky to believe of oneself that one can easily detect the use of AI tools. There are many reasons why I encourage caution here. One is that online detection tools are completely unreliable, especially given that students can combine generative AI tools with paraphraser AI tools like Quillbot, students can instruct their generative AI tools to make errors or write a certain lower reading level, and students can intersperse their own modifications in the outputs of generative AI tools. Another reason is one you mention: we should expect students to get a lot more prompt-savvy very quickly over the next few months, just as all of us have improved radically in our ability to use technologies like search engines.

Josh
Reply to  Graham Clay
1 year ago

This definitely seems plausible to me.

Dustin Locke
Dustin Locke
Reply to  Josh
1 year ago

Imagine a doctor with no independent way of checking whether they are missing any cases of diseases X saying: “I feel like cases of disease X are easy to spot. I suppose it’s possible that I’ve missed some, but in all the cases I’ve spotted it was really obvious.”

Josh
Reply to  Dustin Locke
1 year ago

Yes, obviously I agree that would be bad. Was not remotely suggesting that we should not use detection mechanisms, or something. Was merely observing that (surprisingly, I think) the “eye test” has worked quite well for us thus far (according to detection mechanisms we’ve been using, at least), and so I was wondering if others have had the same experience.

Last edited 1 year ago by Josh
H. O
1 year ago

Isn’t the first and most obvious answer oral exams? Conducting some sort of talk-examination might sometimes even be quicker than working through a student essay. Or, if it’s imperative for essays to be written, they can be discussed or defended to prove authorship and/or comprehension.

Something I’m contemplating in my teaching is bringing back the disputation. This was a structured type of debate based on a written work or often theses (this is the genre of Luther’s 95 theses, btw). But in the academic setting, it was often the student’s instructor who wrote the text, since (the student would usually pay the instructor handsomely and) comprehension was thought better proved by an oral presentation and debate.

The disputation is basically the great grandfather of our contemporary thesis defence, so it shouldn’t be too difficult for anyone to develop a variant of this for the classroom.

Brendan O
Brendan O
Reply to  H. O
1 year ago

Oral presentations would do it. But are they scalable? Time wise maybe it’s a wash with grading. But scheduling all the 20-30 minute presentations would be, err, difficult.

A variant. The prompt includes a requirement that you may be called to explain some portion of your essay to the class. You pick 4 or 5 essays and a paragraph in each. Give them 3-5 minutes to explain it.

Upside. Models good texts, or bad texts. Allows for discussion of the content or manner of presentation. May give students pause in flat out using chatGPT. Provides practice with presenting ideas. Teacher could zero in on chatGPT’d likely essays.

Downside. Requires the use of class time. Some students may be petrified of public presentations. Not every student would do it so maybe fairness concerns. Others downsides I’m not thinking of.

I do believe chatGPT will force us to consider what we want our students to kn/be able to do. And rethink how we assess whether they do know it or are able to do it.

John Keller
Reply to  Brendan O
1 year ago

I gave an account of a *relatively* scalable way to do oral exams here: https://dailynous.com/2022/12/07/oral-exams-in-undergrad-courses/#comment-437164

Graham Clay
Reply to  H. O
1 year ago

I do think oral exams will become much more central, even though there are accessibility and other issues. Still, there are many pedagogically appropriate assignments that are not oral exams and that may achieve results that oral exams cannot, so it would be good to know what it would take to make them AI-immune. Likewise, if you teach online and asynchronously, as many of us do (including me), you need other options.

Justin Kalef
Justin Kalef
1 year ago

This is an excellent project, and a fine way to let these professors see for themselves that their methods are probably completely ineffective at stopping the tidal wave of academic dishonesty that threatens the entire educational system.

Nothing I have seen so far tells me that there is a good, scalable way to avoid this problem for any work students take home with them.

Fortunately, take-home assignments are not necessary for the assessment of learning, and they are a relative latecomer to the scene. There are all sorts of good examinations and in-class activities that can be made AI-proof, provided that one takes care in designing them and (perhaps most important) gets serious about security during exams. It must be ensured that students not get access to their electronics at any point during the exam or other in-class assessment, and that there are no other shortcuts to success on the exam, quiz, or activity. That way, rather than engaging in a dubious arms race with AI or allowing college to have the stuffing torn out of it by forcing students to choose between regular academic dishonesty or falling behind their cheating peers, we can avoid the issue entirely and retain the integrity of our grading.

Graham Clay
Reply to  Justin Kalef
1 year ago

Thanks for the encouragement. One of our secondary goals with this challenge is to show professors about the seriousness of the “tidal wave” you mention, as we have encountered a lot of denialism and ignorance about AI in recent months.

While I disagree on the viability of the “arms race” you mention (and think some takehome assignments are AI-immune), I will let the results of the challenge and our related content speak for itself. But I will say here that your proposed solution isn’t an option for those of us who teach online and asynchronously. That segment of the higher ed market is large and growing — and for good reason — so we need solutions for its sake, even if we grant your view in other domains.

Justin Kalef
Justin Kalef
Reply to  Graham Clay
1 year ago

Good point. I have long suspected that the best solution for students in online synchronous courses is a general agreement among universities to support one another’s students through on-campus testing centers. On that model, a student in a faraway country could take the course asynchronously and then make plans to attend the nearest university for the final exam. I would send my exam to that university’s testing center with instructions, and the university would ensure that tight levels of security are enforced, so that I can trust the result of the exam my student there takes when it’s sent back to me.

These exam centers could be open 24 hours per day, to allow for people who work graveyard shifts, etc. They could pay for themselves quite well if enough people start taking online asynchronous courses. It would be worth a small surcharge to have the convenience of taking courses that way. Perhaps the cost would be borne by the universities themselves, for the benefit of attracting such students: perhaps most of the people who use the facilities would take the courses at that college, anyway.

While this wouldn’t be ideal for oral exams, written exams seem to be a secure, AI-proof alternative so long as one can be sure that the student didn’t have access to electronics at any point between getting the question and submitting the answer.

Simon-Pierre
1 year ago

I ask the students to come for five minutes in my office and explain their paper to me and respond to my objections. If they’ve not written it, that will be trickier (though not impossible of course). And it will help me to have good evidence that they cheated if they did. Otherwise, I do oral exams (they show up thirty minutes before, choose two questions out of four, prepare them, and deliver their answers during a 20 min session). Oral exams are pretty good for teachers like me who tend to procrastinate grading!

Graham Clay
Reply to  Simon-Pierre
1 year ago

The sort of strategy you describe at the outset is one I recommend. In general, I think professors should consider the method by which they would determine whether a student used AI to plagiarize an assignment, and then simply incorporate this method into their assignment design by pairing it with the assignment that is susceptible to AI plagiarism. Rather than respond to suspected plagiarism — which is fraught with all sorts of difficulties — we should structure our assessments from the start in ways that incentivize honest work. I discuss this further here: https://automated.beehiiv.com/p/conceptualizing-solutions-ai-plagiarism-problem.