Conversation Starter: Teaching Philosophy in an Age of Large Language Models (guest post)


Over the past few years we have seen some startling progress from Large Language Models (LLMs) like GPT-3, and some of those paying attention to these developments, such as philosopher John Symons (University of Kansas), believe that they pose an imminent threat to teaching and learning (for those who missed its inclusion in the Heap of Links earlier this summer, you can read Professor Symons’ thoughts on this here). In the following guest post, Benjamin Mitchell-Yellin (Sam Houston State University) responds to Professor Symons, offering his own view of the dangers LLMs pose as well as strategies teachers could employ to minimize them.


[“An ancient Egyptian painting depicting an argument over whose turn it is to take out the trash” made with DALL-E 2 by LapineDeLaTerre]

Conversation-Starter: Teaching Philosophy in an Age of Large Language Models
by Benjamin Mitchell-Yellin

It’s once again the time of year when those of us who teach philosophy are thinking about how to structure and deliver our courses. If you’re anything like me, your courses are writing-intensive; your course objectives include helping students improve their ability to think critically, analyze and construct arguments, and express all of this in writing; and you’re hoping that the tweaks and revisions you’re making to your syllabus will improve your own ability to make good on the promise of what a philosophy course has to offer.

I was in the thick of planning my fall courses, when I got a bit of a shock. In “Conversation-Stopper,” John Symons (Kansas) argues that large language modules (LLMs), like the relatively new and much-discussed GPT-3, “will change the traditional relationship between writing and thinking.” As the subtitle puts it, they’ll make us “less intelligent.” What’s more, Symons claims, “The LLM marks the end for standard writing-intensive college courses.” He expects us to see the effects clearly as early as this October. (As if we needed another reason to dread midterms!)

My heart skipped a beat. Hasn’t Silicon Valley done enough damage already?!

Then I thought carefully about what Symons was arguing and felt a lot better.

Thinking through why Symons’ pronouncement about the death of writing-intensive courses is premature has been a healthy exercise. For one thing, it has made me consider the structure of my courses and assignments in a way I haven’t done since early in my teaching career. For another, it has helped me to see that the worries he raises, and potential solutions to them, apply to a host of issues. My main contention will be that, despite appearances, it’s not really about LLMs, after all.

But let’s begin with a look at Symons’ argument. The heart of his case appears to turn on the novelty of what we’re now facing:

Students have long been tempted by services that write essays for them and plagiarism is a constant and annoying feature of undergraduate teaching, but this is different. The LLM marks the end for standard writing-intensive college courses. The use of an LLM has the potential to disconnect students from the traditional process of writing and research in ways that will inevitably reshape their thinking. At the very least, these tools will require us to reconsider the mechanics of writing-intensive courses. How should we proceed? Should we concentrate on handwritten in-class assignments? Should we design more sophisticated writing projects? Multiple drafts? 

(I’ll have something to say about why it’s unfortunate the concluding questions are merely rhetorical. But first, I want to explain what I think it is that really drives Symons’ argument. Teaser: it’s fueled by the collapsing of an important distinction. Read on, my friends.)

To his credit, Symons doesn’t shy away from considering the implications of what he’s arguing. He goes on to consider whether we should embrace the disruption and “realign our attitudes to writing,” but demurs, in part, because of his experiences, like those of many of us, teaching during the pandemic. Technological tools, such as Zoom, became the norm because they allowed for socially distant instruction—to the detriment of both instruction and society, it would seem. Symons concludes that the arrival of LLMs means that “it’s necessary for faculty to change the way we evaluate student written work in our courses and more importantly, to rethink the role of writing in education. … In the age of the LLM we will not be able to rely on written exercises to make the work of thinking happen. We will also find that writing skills which previously served as reliable signs of the virtues we associate with thinking can no longer do so.”

Let the disruption reign!

I don’t mean to be flip. I think Symons has called our attention to something important. It’s just not what he thinks it is.

I want to make it clear that I agree with much of what Symons says. For example, I agree that “writing can be an aid to thinking.” I also agree with him that it’s likely some students will pass off AI-authored essays as their own work (some likely already have). And I identify with his comment that “the advent of LLMs puts me right back in the position of being a novice teacher.” What I disagree with is Symons’ pessimism about the prospects of writing-intensive courses, and this for reasons having to do with that distinction I promised you.

There’s a difference between teaching someone to think and write and assessing what they’ve written as a measure of the quality of their thinking. It’s not always obvious how the two come apart, and they’re not entirely alien from each other. For example, effective teaching often involves assessment. But some quick examples should be enough to show that simply grading completed essays is not a reliable measure of student learning.

Suppose you have two students, Jack and Jill, and they each turn in essays that receive the same grade. Assume, as well, that there’s no cheating or bias or anything like that. They write the essays to the best of their ability; you grade them fairly. Even though they’ve turned in work of the same quality, you shouldn’t take this to demonstrate you’ve taught them anything, let alone the same things.

Consider a first scenario in which both students turn in A papers. But Jack came into your class with little to no experience writing this sort of paper, nor did he have much in the way of background knowledge about philosophy or arguments. He earned his A by paying attention and putting in a lot of work for your course. Jill, by contrast, had taken lots of philosophy courses before, received lots of writing instruction outside of your course, and basically phoned it in on this assignment. It seems safe to say that you taught Jack a lot and Jill nothing.

In a second scenario, both Jack and Jill earn Ds on their papers. Again, Jack came into your class with little to no relevant background; Jill came in with lots of prior coursework in philosophy. It’s entirely possible in this scenario that, once again, you taught Jack a great deal and Jill nada.

What these scenarios demonstrate is that when assessment is used to take a snapshot of a student’s skills and knowledge, it doesn’t tell you whether there has been growth. To do that, you need more than time-slice data.

And that brings us to October or, more precisely, midterms. It’s fairly typical in a writing-intensive course to assign more than one paper. The feedback (hopefully more than just a letter grade) on the midterm is formative in nature. Students can use it to improve on the final. And if we find ourselves in a third scenario, where Jack goes from a midterm C to a final A and Jill goes from a midterm A- to a final A, perhaps we can be reasonably confident that you’ve taught Jack something and Jill next to nothing. Sometimes assessment can help us to identify learning.

I’ll bet you’ve already sniffed out the ways that LLMs (or paper mills or cutting and pasting from the internet or whatever) could disrupt this. In a fourth scenario, in which one of our initial assumptions doesn’t hold, Jack earns a C on his midterm and then turns to an LLM to “write” his final essay, which receives a B+. An improved assessment outcome, in this context, isn’t a reliable sign of learning.

So far, we’ve seen that when writing is used as a mere means of assessment, it doesn’t help us much to track whether our students have learned anything in our courses. For this same reason, LLMs really do seem to pose a threat to writing-intensive instruction as it is sometimes practiced. This new form of cheating exploits a lack of familiarity with the thinking that a student’s writing is supposed to evince.

But LLMs shouldn’t concern those who use writing (also) as a tool for teaching the sorts of skills that are typical of philosophy course objectives, such as argument articulation and analysis. There are a host of familiar teaching strategies we can employ in our classrooms to help our students learn these skills and help ourselves become familiar with their progress in doing so.

Let’s revisit Symons’ rhetorical questions. Should you, as a conscientious teacher of a writing-intensive course, require Jack and his classmates to do all of their writing by hand in class? This would eliminate the threat of using an LLM to cheat. Should you require Jack and his classmates to write multiple drafts? As many of us who already do so are aware, it’s a red flag whenever a student turns in a paper that is so radically different than any other written work you’ve seen from them before. But while multiple drafts and in-class exercises may be ways to prevent students from gaming the system, many readers are likely thinking that they’re untenable solutions. Who has class time to devote to allowing students to pen entire essays under your watchful eye (or wants to “go full surveillance” on them)? Who has time to give constructive feedback on mandatory rough drafts?

The good news is that in-class writing and multiple drafts don’t need to be time-sucks. You can assign students a short in-class writing prompt, have them put their names at the tops of their papers, collect them at the end of class, and simply use them as a means of taking attendance. Of course, you can also read what they wrote, or what some of them wrote, to find out if they were grasping the material or constructing the argument well or whatever. But none of this has to take much time, and these sorts of activities can do double-duty—attendance tracker and learning prompt rolled into one.

Who says you need to be the one to read and provide feedback on that rough draft? Have students provide feedback to each other using a rubric you’ve given them (perhaps the same one you’ll use to grade the final version). This can be done during or between class meetings, and it can be done in-person or online. Again, the benefits are multiple: there are opportunities to learn both in the giving and in the receiving of feedback. They can turn in their rough draft and peer comments along with the final version of their essay.

Sure, these may be among the best practices when it comes to effective writing instruction. But how are you going to make sure they don’t have an AI “write” their rough draft for them?

Try using templates to scaffold out your writing assignments. Instead of having students workshop rough drafts, have them workshop an initial outline formed from filling in sentence stubs on a template you’ve provided. You can even integrate templates more thoroughly into your course. Provide them with templates to use as guided notes to structure their understanding of the readings—or better, have them construct templates like this for each other. Filling out sentence stubs isn’t the same as writing an essay, but it can be one building block with which to erect a polished paper. And the beauty of templates is that they have a built-in structure. For a student like Jack, it can be enormously helpful to have repeated practice writing within the confines of a template. The essay then becomes a means of expanding on what he’s learned through doing about how to structure an argument, as opposed to an opaque request to produce a piece of writing modelled on what he’s been reading all term.

Now, I don’t pretend that teaching techniques like scaffolding, templates, and peer feedback are surefire ways to eliminate the threat LLMs appear to pose. For one thing, they won’t eliminate the possibility of someone submitting an AI-generated manuscript to a journal for review. Perhaps this is (part of) what the creators of GPT-3 had in mind when they raised the prospect of its being used for “fraudulent academic essay writing.” But I do think the wide array of available techniques for teaching writing should make all of us philosophy professors sleep a bit better at night. Perhaps we’ll need all of that rest, since we’ll have realized we can’t simply assign essays and grade them if we want to make sure our students are actually learning from us.

Symons notes that teaching occurs in “embodied and meaningful social contexts.” I’d add that effective teaching occurs in such contexts over time and involves a certain type of relationship between student and teacher. And I’d like to thank Symons for starting a conversation that helped me to think through our craft like a novice.

USI Switzerland Philosophy
Subscribe
Notify of
guest

18 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Justin Kalef
Justin Kalef
1 year ago

“…But Jack came into your class with little to no experience writing this sort of paper, nor did he have much in the way of background knowledge about philosophy or arguments… Jill, by contrast, had taken lots of philosophy courses before, received lots of writing instruction outside of your course, and basically phoned it in on this assignment. It seems safe to say that you taught Jack a lot and Jill nothing…What these scenarios demonstrate is that when assessment is used to take a snapshot of a student’s skills and knowledge, it doesn’t tell you whether there has been growth.

It seems to me that the work of both Jack and Jill should be graded on the basis of their performance in class (provided that they didn’t cheat), not on the basis of how much they improved or even how much of an effort they had to put in to earn what they did.

If Jack and Jill submitted equally good work under these conditions, I would find Jack’s achievement more impressive, and I would offer him more encouragement and praise, and I would be more inclined to write a glowing letter about Jack. But I see earning a B as something like completing a 10k run in a certain amount of time. If Jill is a marathon runner for whom a 10k run is a relaxing way to spend a Saturday while Jack was unable to run for half a block just a few months ago, and they both complete the 10k run at the same second, Jack’s achievement is far more impressive. But they both achieved the same thing. Jack did not run a marathon (though this should encourage him to raise his hopes for trying that), and Jill did not run any slower than Jack did. The record should show that Jack and Jill earned the same result, and that should be a source of pride to Jack and a sign that Jill should have expected more from herself.

Derek Bowman
Derek Bowman
Reply to  Justin Kalef
1 year ago

What a strange non-sequitur. The example of Jack and Jill in the OP assumes your position – that grades are a measure of performance – and points out that they are not, thereby, a measure of learning. Surely you’re not so obsessed measuring and ranking your students that you don’t ALSO see writing as a tool for learning.

Justin Kalef
Justin Kalef
Reply to  Derek Bowman
1 year ago

On the topic of ‘strange non-sequiturs’…

I certainly agree that writing is a *tool for* learning. (Just as preparing for and completing a 10K run can be a *tool for* improving physical fitness and endurance). But it doesn’t follow from the fact that something is effective as a way of *promoting learning* that it ought to be *assessed* on the basis of how much learning took place.

All sorts of obvious examples show that this is not the case. For instance:

– Suppose my department has an advanced logic course that assumes a prior mastery of the material taught in an introductory logic course, whose content our department has specified for that purpose. I’m teaching a section of the introductory course. Jill, who has already learned to think this way in her years studying computer science, puts forth little effort in my course and can handle the material well. Joe, unlike Jack, does not take easily to logical thinking at all, and fails to learn about half the material in the course. However, after putting in fifteen hours of serious study per week and regularly attending my office hours, he does at least manage to learn the rudiments of logic, which I had never thought possible. If he were to take the advanced course, I’m sure he’d make a fine effort, but his grasp of many of the things he’d need from day one would be lacking and it’s hard to see how he could handle the advanced course even with forty hours of study per week.

Given that we must send a message with one letter, what needs to be signaled to the instructor of that advanced course? Joe’s mastery of the material? That would give him a C-. Or his effort? That would give him an A+, which is higher than I would give Jill for effort. But the fact is that Jill will be able to handle the advanced material and Jack will not.

– Two graduates of the same undergraduate program are applying to graduate school in the same department. They are both particularly interested in Chinese philosophy. Jill is a native speaker of Chinese with a prior degree in Chinese literature, and she has had a lifelong immersion in traditional Chinese culture. She is also highly intelligent. Moreover, she has a strong aptitude for philosophical thinking and excellent in all her philosophy courses with minimal effort. The upper-level undergraduate course she took on Chinese philosophy was a walk in the park for her and she submitted a very perceptive essay that reads like a professional work in the field. However, Joe took that same course and applied himself to it like never before. Despite having a very hard time with the texts, and being fairly new to philosophy after switching to a philosophy major part-way through his degree, he showed remarkable dedication in trying to understand the material, which he managed to do to an acceptable level at the end.

When the graduate admissions committee looks at the two transcripts, would it be clearer for Joe’s grade to be lower than Jill’s, higher than Jill’s, or the same as Jill’s?

Similar considerations apply — maybe even more strongly — when an employer is choosing between candidates and looks at a transcript to see which ones have a strong understanding of some material taught in a university course.

All these courses are tools for learning. But they all require the instructor to convey to others how good the student’s relevant abilities and understanding are by the end of the course. There’s no inconsistency between these two things.

Derek Bowman
Derek Bowman
Reply to  Justin Kalef
1 year ago

“There’s no inconsistency between these two things.”

Which is what makes your contributions to this discussion so bizarre. The OP literally takes no stand whatsoever on assessment. But I’m sure the world’s employers are grateful for your stalwart and unwavering service.

Justin Kalef
Justin Kalef
Reply to  Derek Bowman
1 year ago

Lots of smugness, sarcasm and snark there, Derek. I know that many people have loads of writing such things, especially when they haven’t taken the time to figure out the connections in the discussion.

In case you’re also interested in engaging in this conversation, here’s the passage I’m responding to:

There’s a difference between teaching someone to think and write and assessing what they’ve written as a measure of the quality of their thinking. It’s not always obvious how the two come apart, and they’re not entirely alien from each other. For example, effective teaching often involves assessment. But some quick examples should be enough to show that simply grading completed essays is not a reliable measure of student learning.
Suppose you have two students, Jack and Jill, and they each turn in essays that receive the same grade. Assume, as well, that there’s no cheating or bias or anything like that. They write the essays to the best of their ability; you grade them fairly. Even though they’ve turned in work of the same quality, you shouldn’t take this to demonstrate you’ve taught them anything, let alone the same things.

“Consider a first scenario in which both students turn in A papers. But Jack came into your class with little to no experience writing this sort of paper, nor did he have much in the way of background knowledge about philosophy or arguments. He earned his A by paying attention and putting in a lot of work for your course. Jill, by contrast, had taken lots of philosophy courses before, received lots of writing instruction outside of your course, and basically phoned it in on this assignment. It seems safe to say that you taught Jack a lot and Jill nothing.

“In a second scenario, both Jack and Jill earn Ds on their papers. Again, Jack came into your class with little to no relevant background; Jill came in with lots of prior coursework in philosophy. It’s entirely possible in this scenario that, once again, you taught Jack a great deal and Jill nada.

“What these scenarios demonstrate is that when assessment is used to take a snapshot of a student’s skills and knowledge, it doesn’t tell you whether there has been growth. To do that, you need more than time-slice data.”

I read that as implying that good assessment of student work ought to consider whether there has been growth.

If you think I’m mistaken about that, then I welcome you to show me where you think I’ve gone wrong. Please explain how you interpret that passage in the context, and why you think it doesn’t imply what I think it implies.

Justin Kalef
Justin Kalef
Reply to  Justin Kalef
1 year ago

*loads of fun

Derek Bowman
Derek Bowman
Reply to  Justin Kalef
1 year ago

1) The passage doesn’t make any such normative claim about assessment.
2) The passage doesn’t entail any such normative claim about assessment.
3) None of the rest of the discussion in the OP turns on whether we accept any such normative claim about assessment.
4) The OP is not about what standards ought to be used in grading.
5) Many of the proposals in the OP involve non-graded student work that is used for learning rather than for assessment.

I led with snark because I thought the point was rather obvious and not worthy of further tedious deviations from the main topic. I should have stuck with that initial impulse, and I will endeavor to do so from this point forward.

John Symons
1 year ago

Thanks Benjamin. Not sure we disagree that much. But when I said: “How should we proceed? Should we concentrate on handwritten in-class assignments? Should we design more sophisticated writing projects? Multiple drafts? ” I was being sincere. I’m not sure what the best way to proceed is. 

On Twitter, people responding have made some great suggestions. Ramon Alvarado (U. Oregon)  for example has this https://twitter.com/ramonalvaradoq/status/1547630364616601603

In my own teaching this semester (a graduate level data ethics/ethics of AI  course), I am having the students coauthor their papers with the LLM of their choice while keeping a handwritten journal documenting their reflections on the process of collaboration. I’m curious to see what they come up with. The luxury of small classes is that we really can embrace the disruption in creative new ways as you’re suggesting. As you know well, at a large public research institution like the University of Kansas our mission is to also educate very large numbers of people under the familiar kinds of economic and political constraints. 

Traditionally we could assign essays to our students in giant 300 person courses like Introduction to Ethics, evaluate their quality and assign grades that (when things worked well) signaled (more or less) the students’ capacity to read and think critically, to construct good arguments, and to communicate competently with others.  I think we all recognize that those days are coming to an end. As you suggest, we’ll have to find new ways. 

On a personal note, one of the pitfalls of doing “public philosophy” is that one often doesn’t get to pick the title of the articles. This was the case here.  As a matter of fact, I’m not sure that LLMs will really make us “less intelligent” – whatever that means. But canny editors with the ability to craft good clickbait  did us a service here in getting the conversation started.   Thanks again for your response

Ben M-Y
Reply to  John Symons
1 year ago

The assignment you designed for your students this term sounds great, John! I’d be really interested to hear how it goes. Thanks also for linking to Ramon Alvarado’s thread about his assignment. It’s awesome! It’s always really cool to learn about the creative, engaging ways other folks are teaching writing in their classes. I, too, was being genuine when I said I appreciated your piece prompting me to think through things like a novice.

The point you make about economic and political constraints deserves its own post and extended discussion. It can be a threat to what we do, but it also strikes me as an opportunity to get creative. One thing that troubles me, though, is the more creative we get as educators–and the more successful we are at navigating these constraints–the less of an impact they appear to have in the eyes of the powers that be. So, we end up getting larger classes, etc. I mean, if you can make it work with 150 students, why not 200? 300? Helps to pay those faculty admin salaries.

Point well taken about editors and their subtitles. Say whatever else you will, they do know how to attract eyeballs.

Thanks for taking the time to reply here, John.

Derek Bowman
Derek Bowman
1 year ago

I agree with the value of having students write multiple drafts, but I’m skeptical of the value of peer review. In my experience, even the peer (or graduate student) tutors at the campus writing center tend to give counterproductive advice when they aren’t experienced philosophy majors.

Instead, for each major essay I have an in-class draft workshop where I give students a self-assessment guide for reviewing their own essay. For the first half of class, I ask students to work silently on their own essay. After that I allow them to ask me questions and to give them the option of discussing their drafts with other students. Sometimes this has been incredibly successful, and at a minimum it has greatly reduced the number of written-at-4-am-the-night-before messes I’ve had to read. And while it does not eliminate the threat of plagiarism, it greatly mitigates it. (I do make students turn in their rough draft in advance of the draft workshop, and I don’t explicitly tell the students that I won’t be reading them unless they ask me directly).

My personal website is a bit out of date, but you can currently find two sample Draft Worksheets on my teaching materials page (linked at my name).

Derek Bowman
Derek Bowman
Reply to  Derek Bowman
1 year ago

Not sure why the link didn’t go through, but here it is: https://derekbowman.com/teaching/sample-course-materials/

Jonathan
1 year ago

Admittedly, I haven’t read the entire article yet, but my initial response is that the problem lies in the traditional approach to education, not in the existence of AI tools that can help students “cheat.” I don’t think the latter would even be a minor problem if we weren’t so focused on motivation through competition and on evaluation during the process of learning. I don’t think grades are effective at either intended purpose (evaluation or motivation). I have many reasons for believing this which I don’t really have time to list right now, but I will just briefly mention the existence and demonstrated utility of Montessori schools. There have also been psychological studies that support the hypothesis that extrinsic rewards can sometimes have a counterintuitive harmful effect on motivation. And as for the effectiveness of grades in evaluation, I think this article actually makes my point for me. If it’s difficult to differentiate students that are cheating and students that are being honest, then grades are unlikely to effectively distinguish between those two kinds of students. I’ve also been working as a graduate teaching assistant over the past year, and I can verify that grading is often quite subjective and unreliable. Even as a TA for some pretty concrete STEM classes, it’s quite difficult to make decisions about how much weight to give to different kinds of mistakes in assignments.

If we didn’t mix evaluation and learning together into this convoluted mess, I think many of the problems you might predict with emerging technologies would go away. After all, if an AI truly is better at a task than humans, why not use it for that task? The problem only arises when we give students tasks for evaluation which are usually simpler than real world tasks, and they are able to use AI (or other computational tools) to trick the evaluation metrics. If the purpose of a class (and the associated tasks) is to learn the material for the students own direct benefit, then cheating is not a danger. There’s nothing to cheat on, because you don’t get added benefit of a graded (or the avoided detriment of a bad grade). If we evaluate people at the point where their skills are actually applied in the real world, then cheating becomes much harder. At that point, you are addressing real world problems, and if so AI can help you cheat on that, then we might as well just use the AI as a tool for that task.

Note: yes, there are other issues with automation and loss of jobs, but I think instead of avoiding the use of AI, we would be better served by directing the money saved through use of AI towards social safety nets or even universal basic income. I also think that there is intrinsic value to learning something even if an AI can perform better on tasks that require that learning, but that doesn’t mean we should give up the use of AI as a tool. AI can beat the best chess and go players, but people still play those games.

Preston Stovall
Preston Stovall
1 year ago

Symons notes that teaching occurs in “embodied and meaningful social contexts.” I’d add that effective teaching occurs in such contexts over time and involves a certain type of relationship between student and teacher. And I’d like to thank Symons for starting a conversation that helped me to think through our craft like a novice.

Hi Benjamin. I like the way this ends. Thanks for sharing it, and encouraging the conversation. Like Jonathan, I’ve become skeptical of grading schemes over the last few years. At the same time, Justin Kalef is surely right that whatever else grading is doing, it signals a degree of competency in a field. And LLMs threaten our ability to tell who has the competencies needed to deserve the titles they’re given. I’ve been reading up on alternative grading schemes in preparation for this term, and I think I’ll try a mix of ungrading (for upper-level courses) and specifications grading. Each might go some way toward addressing the problems posed by LLMs. But I’m new to this, and I’m spit-balling here.

For the ungrading, I’ll spend part of the first class explaining a grading scheme based on regular production of short discussion pieces, and conversation. I haven’t fully settled on how to handle the assigning of grades. Whatever I do, I’ll emphasize the collaborative nature of the grading system, and of the course in general.

For lower-level courses (where I have more students) I’ll use Nilson’s specification grading, which pitches the final grade as a contract: the student knows just what the class calls for, there’s a pass/fail structure, and there are relatively high expectations for minimal competency (typically a B or higher). Students are encouraged to excel by learning some of the metatheory, or by developing the ability to tackle more sophisticated problems, or by becoming more familiar with the history of the field. The key is to hammer out the specifications in sufficient detail as to let students know just what is coming. Together with clear rubrics of assessment, this lets instructors spend our prep time focusing on what we’re supposed to be doing: crafting instruction so that students have the opportunity to learn the needed competencies. Then, on the basis of the rapport we develop with them over the term, we guide them work through the work.

Concerning the impact of LLMs on writing-intensive courses, I suspect ungrading has the best chance to weed out the LLMers. The course culminates in a conversation with each student about his or her work, and of what that work has earned by way of institutional recognition. With specifications grading, I’d worry that there were students whose submitted work was meeting the requirements, without being sure that the work was theirs. If the course is small enough, one could have more one-on-one interactions with students, and that might suffice. But in large courses, or when instructors have to teach many courses, that may not be feasible.

On a related note, at the start of the pandemic I suggested using new-media technologies to shift to pedagogies that were more like the tutorial system of Oxbridge. I wonder what the impact of LLMs will be on systems like that.

https://dailynous.com/2020/04/06/now-may-time-transition-tutorial-instruction-guest-post/

Ben M-Y
Reply to  Preston Stovall
1 year ago

Thanks for this, Preston. Your post about the tutorial system is interesting. In a world without familiar financial pressures (some of which John Symons notes in his comment here in the context of writing-intensive instruction), this seems like an interesting possibility. I wish we had approached the pandemic as an opportunity to rethink things a bit more than we did, including, but not limited to, the way structure higher education. Alas, we seem to have raced to try and get back to normal, such as it is.

That’s really cool that you’re trying out ungrading this term. So am I! Your comments seem to me spot on. The only thing I’ll add to what you say is that my university is offering embedded tutors for this first time this term (as part of a larger grant the academic success center received), and I am lucky to have one working with me in my course. She’s an excellent student from a past course, so we know each other. And she’ll be in the classroom with us the entire term, as well as having office hours in the tutoring center. Our plan is to structure things so that the students receive three rounds of formative feedback on each written assignment: peer feedback, feedback from the tutor, and feedback from me. They will turn in portfolios at the end of the term, including all drafts and all feedback on each assignment; and they’ll write a short reflection paper in which they’ll discuss their work over the term as a whole (on written assignments, readings, in-class discussions, group work) and propose a final grade. We’ll then meet during finals week to discuss their portfolio, reflection, and proposed grade. I’m excited to see how it turns out! The literature I’ve seen suggests it should be a good experience. Though I do know a number of my students are nervous about not receiving grades on their work throughout the term. I’ve tried to quell the anxiety a bit by being as transparent as possible and also soliciting feedback as we go. I wonder if you’ve encountered anything similar and how you’re handling it. This is one danger of doing something nonstandard, I suppose.

Preston Stovall
Preston Stovall
Reply to  Ben M-Y
1 year ago

Thanks, Ben. This is the first time I’ll be trying this, and courses haven’t started in Czechia yet, so I haven’t had to handle any anxiety issues so far. But I plan to do something similar, with regular meetings and a portfolio with all of the work we’ve done over the semester. I expect to spend a fair amount of the first meeting going over the way ungrading works, and trying to address any questions or concerns the students have. It’ll be important to build a consensus that we’re all in this together, I think, and I hope I can help model and encourage that through regular course discussion and participation as the semester progresses.

Eric Steinhart
1 year ago

If AIs can excellently write undergrad papers, then I suspect that by symmetry they can also excellently grade undergrad papers. If they can, and if they are cheap enough to deploy at scale, then what?

For one, much of what professors do for assessment (as mentioned here) becomes automated. Add automated grading to pre-packaged online courses (via OPMs), and professors are obsolete.

But perhaps the AIs will really change how philosophy gets done in the classroom: rewrite your paper endlessly until the robot gives you the grade you want. That probably creates an interesting kind of learning. It neutralizes the arguments that human professors lack the time to grade multiple drafts etc.

So now what will the human professors do? What will the classroom become? If we do it right, we’ll be doing things very differently from the past models. Very differently. Welcome to the singularity.

Ben M-Y
Reply to  Eric Steinhart
1 year ago

You bring up some interesting possibilities here, Eric. Thanks!

I’m not sure what to think about your idea that AI might be excellent at grading papers if excellent at writing papers. I’m not so sure about this, but my reasons may simply display my own ignorance. For one thing, as John Symons describes, current AIs are not excellent at writing philosophy papers; they are above average at it given certain prompts. As things are and assuming the symmetry you suggest holds, we should be worried if papers were graded by merely above average graders. (And I completely understand that many human instructors are merely above average at grading–that is, they are not excellent at it, by reasonable standards, such as consistency, fairness, accuracy, etc.) But even supposing AIs become excellent at writing philosophy papers, it’s not clear to me that the two skills, writing and grading, are as symmetrical as your comment appears to suppose. This is especially so given how I understand relevant AIs to work. What would GPT-3 do with original thoughts or critiques of accepted wisdom in relevant literatures? Could it recognize them? As such? And assess them relevantly–eg, by rewarding original insights? Now, we may not find original thoughts in many student papers, but we sometimes do, or at the very least, we shouldn’t hand the reins to a grader incapable of doing so.

Your comment about the interesting kind of learning that writing multiple drafts with AI-assessment may involve is intriguing. I also agree with your claim that if we’re doing things right, we’ll be doing things very differently. But as I tried to convey in the original post, I don’t actually think our reasons for doing things differently need have much to do with AI. I think the status quo–assigning papers and merely grading final submissions–is flawed for reasons having to do with the fact that mere assessment does not track learning well. AI may be bringing some issues to the fore, but it doesn’t seem to me to be presenting us with novel reasons to change our ways in this regard. If we are aiming to teach thinking through writing, we shouldn’t be simply assigning papers and grading final submissions. This applies just as much in a world with AI that write papers as it does in a world without them.

Eric Steinhart
Reply to  Ben M-Y
1 year ago

I think we’re in broad agreement here. I don’t want to make a prediction about whether and when AI graders will exist, or be cheap enough to deploy at scale, etc. I do think that day will come tho. And that, I think, will be really, really disruptive!

I think of the analogy with chess (tho it’s not a very strong analogy). Chess players get coached by AIs, have their games analyzed by AIs, practice tricky strategies with AIs. And this has made human chess playing much, much better.

There are probably lots and lots of amazingly good ways to use these LLMs to do better philosophy, and better philosophy teaching too! (And, of course, some amazingly bad ways.). They’re going to change things in unanticipated ways.