Students Submitting AI-Authored Papers: Poll & Discussion


Did your students use ChatGPT or other LLM tools to cheat on their assignments? If you taught a philosophy course this term, please respond to the one-question poll, below.

How many students of yours this term did you notice used ChatGPT or other LLMs or AI tools to illicitly write papers or complete assignments?

It would be useful to hear from professors about their experiences with this. Here are some further questions for discussion:

  • If you did not detect any of your students cheating using ChatGPT or the like, was that because of something you said to or did with them, or because you intentionally crafted your assignments in a way resistant to their use, or did you just get lucky, or did you not bother trying to detect their use, or what?
  • If you did try to craft ChatGPT-resistant assignments, what were they?
  • Did you use AI-detection software to find instances of cheating, and if so, which did you use, and how well did it work, in your opinion?
  • Did you spot the AI-written work without the assistance of detection software, and if so, what did you look for?
  • Did you report students who cheated using this technology to official academic misconduct offices at your university or college, and if so, how receptive to (and prepared for) this kind of cheating were they?
  • As more and more students learn of these tools and how to use them, do you think your methods for handling the issue this term will work next term?
  • What would you do differently?

Thanks for sharing your thoughts on this.


Subscribe
Notify of
guest

41 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Dr EM
Dr EM
11 months ago

I used the software and looked closely at the essays that it flagged. Some I couldn’t see anything obvious (so could have been AI or just very well written), but two of the essays had the typical AI false citations. One of them said that Kymlicka thought multiculturalism was dangerous and another cited Okin as a defender of female genital mutilation. Other citations were also fake, but not as obviously so.

Another tell is very short paragraphs, and paragraphs that are entirely quotes. It’s not enough on its own, but it lends weight in conjunction with other evidence.

C Bressler
C Bressler
Reply to  Dr EM
11 months ago

May I ask which software you are referring to?

Eric Steinhart
11 months ago

A few students illicitly used ChatGPT in my Fall 2022 classes.

During Spring 2023, I gave several assignments (in all classes) requiring them to do prompt engineering with ChatGPT (or similar LLM), and to assess the results. (These assignments were structured in various different ways.)

No students in my Spring 2023 classes tried to cheat with ChatGPT.

Jack
Jack
Reply to  Eric Steinhart
11 months ago

That sounds like an interesting assessment model. Just wondering, though: how do you know that no students in your Spring 2023 classes tried to cheat with ChatGPT / another LLM?

Eric Steinhart
Reply to  Jack
11 months ago

You learn to recognize the difference between students vs LLM writing pretty quickly. (Sure, yes, maybe the students were very clever and tricked me. But that takes effort. None of their work looked like LLM text.)

Graham Clay
Reply to  Eric Steinhart
11 months ago

The issue is that students are becoming more and more aware of the various AI tools that can paraphrase the outputs of the well-known LLMs. These paraphrasers, like Quillbot, take no effort to use, of course, and can easily change the writing style. The LLMs themselves can be directed to change their writing styles, too. There are many viral TikToks and YouTubes about this, and I assume word is spreading quickly among the students (several of mine told me that their friends were regularly talking about ChatGPT, how some of their acquaintances got caught, etc).

Eric Steinhart
Reply to  Graham Clay
11 months ago

One of the things I plan to do in the fall in an in-person Phil of Religion class is to work with my students to use LLMs to do creative philosophical work.

A fairly shallow example would be to ask the LLMs to generate new arguments for or against various religious propositions (the existence of God, etc.). Or to generate new arguments, say, against Christianity or against atheism. We do this together in class, we learn together how to drive these machines.

A more interesting task would be to work with the students on getting LLMs to generate new religions, and then to ask whether they would be attractive for humans. We might then get the AIs to generate arguments for (or against) them. We might even use image generators to produce illustrated religious texts. We can then work together to ask all sorts of interesting questions (are the AI-generated religions really novel, are they religions, etc.).

We can also ask the LLMs to explain why humans are religious, to ask some interesting questions about whether the LLMs are religious (in any sense), etc. To really explore the philosophical views of these alien neural nets.

The options here really are endless.

Over the summer, I hope to come up with some fairly concrete and specific plans to implement in the fall. But the general plan is to make very, very heavy use of generative AIs.

Graham Clay
Reply to  Eric Steinhart
11 months ago

What were students required to do when they assessed the results? Were their assessments of the results themselves written and turned in?

Eric Steinhart
Reply to  Graham Clay
11 months ago

They would include (1) The GPT produced text; and (2) Their assessment of the GPT text. So, yes, they had to write their assessments and explain them.

For one assignment, they were writing on a controversial topic, and I asked them whether they agreed with GPT’s treatment of the topic. Typically, they were highly critical of GPT (GPT is often so neutral that it will offend both sides). Also, I asked them to address questions of accuracy, etc.

Other assignments required them to do prompt engineering, and give prompts that didn’t work, and prompts that they thought did work.

An interesting assignment which I didn’t do (but my great colleague Pete Mandik has done) is to get them to use an AI image generator to make an image of a philosophical text or concept. Brilliant idea, which I hope to do too!

Graham Clay
Reply to  Eric Steinhart
11 months ago

Interesting. My view is that the first assignment you described can be easily completed with the free and publicly available LLMs. The students just need to feed (1) back into the prompt, request the LLM to assess it in the relevant way, etc. Our testing for the AutomatED AI-immunity assignment challenge is showing that iterated assignments like this are generally not AI-immune at all. Would you be willing to submit your assignment to the challenge?

As for the second assignment — where they need to give prompts that work and prompts that don’t work — this might be AI-immune if the prompts were not themselves represented in text (i.e. screenshots were required). But soon multi modality AI tools will affect this sort of solution, too.

The assignment from Pete Mandik sounds really cool. The image generators are a lot of fun to play with, and it is always surprising what they produce.

Eric Steinhart
Reply to  Graham Clay
11 months ago

Technically, that first assignment is not AI immune at all – I’m not trying to make it immune. Practically, the students like criticizing the AI! Look at me, I’m smarter, I’m a human!

The second assignment is also not AI immune, and, again, that’s not my goal. The students likewise have a practical incentive: they want to learn how to do prompt engineering, they’re aware that there are jobs for prompt engineers.

I’m not interested in building a prison around my students. They want to use these tools, I want to help them use them.

Graham Clay
Reply to  Eric Steinhart
11 months ago

I was led to interpret you as making claims about AI-immunity for several reasons. First, the context of this discussion (provided by Justin) is focused on that aspect. Second, you claimed that “No students in my Spring 2023 classes tried to cheat with ChatGPT” — contrasting this semester with last semester — but this would indicate that you have (good) reason to think that these assignments are AI-immune. Third, you said that “GPT is often so neutral that it will offend both sides”, which I took to mean that you thought that GPT wouldn’t be capable of assessing GPT’s outputs without being neutral (so if your student’s submissions were not neutral, that was reason to think that they were not GPT-generated). Etc.

Regarding your other pedagogical points: yes, we should help students learn how to use AI tools relevant to our field; and yes, we shouldn’t build a “prison” around them, unless a “prison” makes sense pedagogically, which it often does.

Eric Steinhart
Reply to  Graham Clay
11 months ago

I didn’t read the OP as a call for how to make assignments AI immune, but I can see that it could be read that way. Perhaps I interpreted “AI immune” too narrowly, as building a barrier against using AI in any way at all. One way to make an assignment immune to AI abuse (i.e. violations of academic integrity) is to require proper AI use, which is what I want to do.

I think curiosity is an important philosophical virtue, and I really want my students to figure out ways to use these tools productively and ethically. I want to be working with, not against, my students on this.

Laura
Laura
Reply to  Eric Steinhart
11 months ago

Agree – that’s what I’ve been trying to do this term. Unclear thinking produces unclear writing, so one reason to teach writing is to teach the clarity of thought that must accompany it. AI tools can help students to clarify and refine their arguments, or find more graceful ways to express what they’d like to say. I hope they learn how to use the tool for this purpose, but it can’t be a substitute for doing their own philosophical thinking.

I don’t know yet whether they’re using AI in an academically dishonest way because I told them they could use AI in developing paper drafts, as long as they cite it carefully and show how they incorporated any AI-generated content. I’m hoping a process of working on drafts together in class, along with later aid from Turnitin, will be enough to differentiate between dishonest reliance and adoption of a helpful tool. We will find out soon if it works!

Eric Steinhart
Reply to  Laura
11 months ago

Working on drafts in class is a great idea, and showing them how to do prompt engineering, and how to get the LLM to produce novel or interesting outputs (instead of just bland Wikipedia summaries). It’s great that you’re moving philosophy forward!

T K
T K
11 months ago

I had two students who, in my view, clearly used some chatbot program to complete an assignment. There are certain prompts that ChatGPT seems to give either (i) exactly opposite answers or (ii) unrelated answers. For example, if I ask about Thomson’s People-Seed Case instead of her Violinist Case, ChatGPT will produce an answer about the Violinist Case no matter how much prompting I try to give it to do otherwise.

In general, I found it quite helpful to run all my essay question prompts through ChatGPT numerous times. It makes clear the standard issues that arise with the answers – some of which are fairly uncommon in student essays written without using ChatGPT.

Graham Clay
Reply to  T K
11 months ago

One way to get ChatGPT to respond about specific content is to prompt it with quotes of that content. So, if you embed a quote of the People-Seed Case into a prompt and direct it to address some aspect of that quote, it will do it.

T K
T K
Reply to  Graham Clay
11 months ago

Thanks. This is helpful to know.

Despairing
11 months ago

I taught an online community college ethics class that was an absolute disaster, and I strongly suspect the underlying cause is widespread, heavy reliance on ChatGPT among my students.

My assignments are not (yet) open to ChatGPT cheating. The course focuses on developing skills in working with arguments from principle and arguments from analogy. The early phase of the course focuses on analyzing the structure of these arguments by representing them in standard form. ChatGPT doesn’t yet know what standard form is.

The assignments are not difficult, and I’ve done tons of testing of similar assignments in online classes and in-person classes. Historically, my English-language learners can do them, my frazzled single-parent students can do them, etc.

The online students this semester simply did not do them. They repeatedly turned in ChatGPT summaries of the arguments I asked them to represent in standard form. In many cases, I couldn’t get them to do anything else, through the end of the semester. They just kept turning in AI paraphrases they would try to tweak in various ways.

Out of 35 students enrolled, it looks like 11 will pass the class. (And 7 of those 11 will pass with a C.) It’s like nothing I’ve ever seen. The insistence on using paraphrasing software is new, and near as I can tell explains 100% of the sharp break this semester from the patterns of previous semesters.

grymes
grymes
Reply to  Despairing
11 months ago

Yeah, this is precisely the issue I ran into with “LLM-proof” assignments: they just used ChatGPT anyway, and turned in non-sequiturs. Guess it’s back to blue books for ol’ grymes!

Commiserating
Commiserating
Reply to  Despairing
11 months ago

This is important. And I hope that instructors at 4-year colleges and universities will take notice that their students are likely better at cheating. This fact, combined with heavy reliance on traditional essay formats in many classes, will just make it easier for instructors at those institutions to overlook all the cheating that is happening. This is probably a good reason for us all to turn more heavily towards skills-focused, practice-based instruction.

Graham Clay
11 months ago

I taught 45 students this semester in my online asynchronous course, and I answered ‘4-6’ to the poll above. I caught a few with obviously fake citations (e.g., the name cited was ‘Lorem Ipsum’); I caught a few who used AI paraphrasers like Quillbot to paraphrase the course content without any sensitivity to the prompt of the assignment; and I caught a few who I was able to get to admit to using ChatGPT. (AI detection software is very unreliable, especially given the existence of many ways to get around it.) I reported these students to the official academic misconduct offices–even the cases I couldn’t prove–and they were treated as plagiarizers by those offices (the students can appeal, of course).

Next semester, if I teach this course again, I will use some of the time I previously used for grading for oral assessments; I will rely more heavily on my assessments embedded in Articulate Storyline 360; and I will focus on aspects of my assignments that cannot be done well with AI (logic representation, graphical presentation of arguments, etc).

I will also incorporate what I am discovering as I run our AI-immunity assignment contest at AutomatED (https://automated.beehiiv.com/)–a newsletter/blog on tech in higher ed that I recently started.

In fact, in the next few days, I will be posting the first piece on an assignment a professor submitted to our contest. In it, I will explain why the assignment was expected to be AI-immune, how we cracked it or why we failed to, and what the takeaways are for professors. Those who are interested should subscribe! And if you have an assignment (take-home and written) you want us to try to crack, submit it to the contest (https://automated.beehiiv.com/p/believe-assignment-aiimmune-lets-put-test).

Eric Steinhart
Reply to  Graham Clay
11 months ago

Your example of what LLMs do poorly is also a great opportunity – Asks students to get an LLM to formalize a natural language argument (e.g. they input the text of Proslogion 2 as the prompt), and ask them to criticize the output (e.g. is it a good formalization? where does it fail?). The GPTs are pretty wild formalizers, although I’m told Claude+ does a very good job at formalizing natural language arguments.

I’ll be doing this in Phil of Religion next semester.

Graham Clay
Reply to  Eric Steinhart
11 months ago

Yes, I agree. Currently, many of the LLMs are poor at tasks that are more technical, formal, logical, etc., though I expect this to change rapidly (GPT4 is a lot better than GPT3.5 at various inferential tasks). I haven’t experimented as much with Claude but hope to do so in the coming months…thanks for reminding me about it.

Kenny Easwaran
Reply to  Graham Clay
11 months ago

A citation with the author “Lorem Ipsum” doesn’t sound like a LLM output – it sounds like the student had a template for “how to write a bibliography” and forgot to change out the default names from the template!

Graham Clay
Reply to  Kenny Easwaran
11 months ago

Interestingly enough, this student later admitted to using ChatGPT for this paper and bibliography, so I guess it is a possible LLM output. I have not seen this LLM behavior myself, though, so it would be interesting to see what the student’s prompts were. Potentially the student directed it to a stock template as part of their prompt? Not sure.

Sam Elgin
11 months ago

I teach two relatively large, writing intensive lecture classes (100-200 students). In the beginning of the term, I told students that they were not allowed to use chat GPT to write their papers, and demonstrated that detectors could catch generated text.

After assignments were submitted, I asked my TAs to select random paragraphs from students’ papers and enter them into chatGPT detectors. Over both classes, maybe a dozen students’ papers were flagged. I’m not 100% confident in the detectors (though they seem pretty reliable) – so I asked these students to come to my office to discuss their papers. In these discussions, some of the students admitted to using chat gpt. I reported these students to the academic integrity office. Everyone reported was found to have been academically dishonest – and one student was suspended.

I’m not sure I caught all – or even most – dishonesty.

Billy
Billy
11 months ago

Many essays from students I’ve been reading this week have the feel of AI written text (you know, that lifeless feel). But they are passing the detection devices I’ve tried. I’m wondering if my students are using https://undetectable.ai/?via=36ovg — it looks like it takes AI text and rewrites it such that it passes through the detectors.

Steve
Steve
11 months ago

I had 10 students, across a total of 60, use it to do an assignment due last week. I was very upset at the distribution. The assignment was an argument reconstruction, which was made to be chat-gpt resistant. I spotted them easily because I ask them to (1) use quotes, and (2) refer to specific examples used in the target article, neither of which GPT-3 is currently good at. Plugged them into detection software which confirmed it, then confirmed it again by asking all the students how they came up with their answers. The chair of my department was supportive of viewing and treating it as cheating.

Next semester, I’m going to do what some have suggested and incorporate LLMs into my assignments: eg, asking students to generate a response to some prompt, then asking them to evaluate chat GPT’s response.

Anna
Anna
Reply to  Steve
11 months ago

The same students will just ask GPT to evaluate its response in turn.

Daniel Weltman
11 months ago

Among the various irritating features of ChatGPT’s existence, perhaps my least favorite is its interaction with Perusall.

If you haven’t heard of Perusall, it’s a collaborative annotation software. Students can highlight parts of the course readings and write questions and comments, respond to each other, and so on.

I adore Perusall for many reasons but I’m worried ChatGPT is ruining it. Students just use ChatGPT to generate their comments for Perusall. The comments are of course worthless, so they just pollute the discourse for the students who are actually doing the work, and the comments are short enough and their overall role in the course is marginal enough that I’m not sure I want to go through the effort of accusing students of academic dishonesty over their Perusall comments.

(It is also a lot of effort just to make the accusation, because you have to export the comments from Perusall, flag each of the ones you take to be AI written, etc.)

Most ChatGPT use only harms the person using it, and maybe tangentially the professor who wastes time in various ways responding to ChatGPT and very tangentially the other students who themselves get less pedagogical energy devoted to them because some of it has been sucked away to deal with ChatGPT. But when it comes to Perusall, ChatGPT use harms all the students.

I’m rather said about this and I worry one of my favorite teaching tools might be much less useful from this point forward, if not irrevocably ruined.

Graham Clay
Reply to  Daniel Weltman
11 months ago

I have run into the same issue with hypothes.is, which is another social annotation tool I use in the same way you use Perusall. The only difference is that I have students who I believe used ChatGPT for their annotations but must have quoted the text of the reading for their prompt to ChatGPT. These annotations are of decent quality, actually, at least relative to my students’ standard contributions.

Sam
Sam
11 months ago

Did many or all of those who submitted a “0” vote (i.e., none of their students illicitly used LLMs or other AI tools) permit students to use such things to write assignments? Unfortunately, the survey doesn’t allow us to differentiate.

Ian
Ian
Reply to  Sam
11 months ago

The question asks about “illicit” use.

Sam
Sam
Reply to  Ian
11 months ago

Right, so if you permitted your students to use the tools, you could submit “0.” But if you didn’t permit them, and none used the tool, you could also submit “0.”

Noteven_French
11 months ago

Some of my (French) students obviously used GPT; I guess you all know the cues, but here are mines:

  • lifeless style (no commitment to the material, just dry descriptions)
  • Perfectly balanced sentences (no 6 lines long sentences or complex sentences)
  • No misspellings or grammar mistakes (especially strange)
  • No quotes or precise quotes (with a precise page)
  • No reference to any element of the course
  • Direct contradictions between parts (as if the writer did not remember what she just wrote)
  • And any part of the suspect passage is detected as “likely written by an IA”
State School Prof
11 months ago

I suspect at least five students out of 80 used ChatGPT or another program. I felt the evidence was strong enough to refer three for academic misconduct.

Turnitin’s built-in AI-detection software was pretty good at catching these cases.

I am going to radically change things the next time I teach this course: in-class exams are back in fashion!

Polaris Koi
11 months ago

Illicit ChatGPT use is not possible on my classes, because I allow ChatGPT use. I explain my rationale for that here: http://justice-everywhere.org/general/student-use-of-chatgpt-in-higher-education-focus-on-fairness/

Renee
11 months ago

6 of 32 in an intro course, at least that I detected. I used copyleaks and a couple of other detectors when I had suspicions. All of the students confessed when I expressed concerns about their assignments.

Nothing new to add to this discussion except this:

On a final reflection assignment, I asked students to describe 2-3 things they learned that will stick with them. One student reported that the problem of free will and moral responsibility and the reading by Fischer will stick with her. We did not do this problem or read Fischer in my class.

Laugh or cry?

Franz
11 months ago

I caught 3 out of 13 essays. So that’s a significant minority. 2 of them were really bad papers: as bad as the outputs I got when I tested my essay titles on ChatGPT (testing of course helped me to detect use of it). Recurring features: Lots of repetition, lack of direct quotes, lack of page numbers in citations. The third paper I caught was so-so, but the hand of ChatGPT was evident. Of course the students didn’t report using ChatGPT, despite regulation that they should do so – because the whole essay or too large parts of it were simply written by ChatGPT.

Franz
11 months ago

Don’t know if anyone is still following this, but: what do you think of the writer you can find in Microsoft Edge Discovery? I mean the writing gadget on the right side of the Edge window, not Bing Chat. That seemed to me much better than ChatGPT.