Illicit Use of AI by Philosophers Refereeing for Journals


In 2024, a study found that “7–17% of the sentences in the reviews [of computer science manuscripts] were written by LLMs”. It was only a matter of time before this spread, and now it appears to have reached philosophy.

Last year, a philosophy PhD student in the US submitted a paper to a well-known philosophy journal.

They write:

The paper was rejected a few months ago; the first reviewer left very detailed feedback and suggested substantial R&R while seeming generally positive about the paper, the second reviewer suggested rejection. At first, I appreciated both of their feedback, and this was sort of my first go at submitting for publication anyway. 

However, I recently showed this feedback to someone else who thought the first reviewer (the more positive reviewer!) sounded like AI. [This] hadn’t occurred to me when I first received it, but it now seems very clear that it was AI-written. Although I know they aren’t the most reliable tools, I also checked with a couple of AI detectors and they come back as highly confident the text is 100% AI generated. 

I believe this is the first time someone has written to me about this happening in philosophy, which means it is almost certainly not the first time it has happened.

Has it happened to you? (This is philosophy, so start by checking the reports that seemed relatively nice.)

It’s worth discussing. We can start with why, as things stand now, if you are asked to referee a submission for a journal, it would probably be wrong for you to use AI’s like ChatGPT, Claude, Gemini, etc., in doing so (except, sometimes, in a very limited capacity). Here is why:

1. Let’s start with you feeding the manuscript into an AI. One problem here is that you have no reason to believe that the author of the manuscript agreed to the manuscript becoming training data for an AI, or for whichever AI you happen to use. Given that there is well-known controversy over this, you shouldn’t assume the people you are working with would agree it’s okay.

When you agree to referee for a journal, you agree to abide by the publisher’s review policies, and various academic publishers explicitly prohibit in their reviewer guidelines uploading a manuscript under review into an AI. For example, Oxford Academic’s policy for reviewers states:

It is prohibited to upload project proposals and manuscripts, in part or in whole, into a Gen AI tool for any purpose. Doing so may violate copyright, confidentiality, privacy, and data security obligations.

2. Now let’s turn to the assessment of the submission and the writing of the report. The journal editor asked a specific person to review a manuscript: you. If you accept their invitation to review the paper, you agree to review it. If you were then to give the manuscript to one of your colleagues or graduate students to review, you would not be doing what you agreed to do. If you did this and then didn’t tell the editor of the journal for which you’re reviewing, then you’re fraudulently submitting another’s work under your own name. The fraud is still there when it’s not a colleague or student but an AI to which you’ve handed over the task.

Again, when you agree to referee for a journal, you agree to abide by the publisher’s review policies, and some publishers outright prohibit the use of AI-written referee reports. Elsevier’s policy, for example, states:

Generative AI or AI-assisted technologies should not be used by reviewers to assist in the scientific review of a paper.

An additional concern is that authors typically submit their work for peer review under the reasonable expectation that, if the work is of sufficient quality, they may receive constructive feedback on it from fellow experts. The use of AIs to write referee reports not only fails to meet this expectation, but does so at a cost. As the graduate student who wrote to me put it, “I find it very frustrating because if I wanted feedback from an LLM I would just ask it, instead of waiting months for feedback from a journal just to send me AI-written feedback.”

3. Even more limited use of AI that doesn’t involve uploading the submission into an AI or having the AI generate comments on it can be problematic. It might be acceptable for a referee who has read a manuscript and written up their report to then feed the report into an AI for rewriting (say, for tone, clarity, grammar, translation), check the rewritten version for accuracy, and then submit that AI-written version. But even the permissibility of this varies across publishers. Elsevier, for example, prohibits it:

This confidentiality requirement [restricting the uploading of the submission to an AI] extends to the peer review report, as it may contain confidential information about the manuscript and/or the authors. For this reason, reviewers should not upload their peer review report into an AI tool, even if it is just for the purpose of improving language and readability.

Taylor & Francis, meanwhile, states in their policy that it may be permissible:

Generative AI may only be utilised to assist with improving review language, but peer reviewers will at all times remain responsible for ensuring the accuracy and integrity of their reviews.

Note that individual journals may have more restrictive policies.

At some point, as the technology improves, our norms, policies, practices, and expectations may change. And some changes may be welcome, as our current academic publishing system certainly has its problems. But for now, if you’ve agreed to referee a paper, don’t try handing off that work to an AI.

Discussion is welcome. One thing to keep in mind, philosofriends, is that this is largely a matter of policy. So while we might imagine various “in principle” examples of the permissible use of AI in refereeing, these may be of limited value in figuring out which institutional rules we should adopt for imperfectly influencing the behavior of many people who differ in many ways.

guest

46 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Michel
17 hours ago

Chatbots are not my peers. End of story.

Paul Jaspers
Paul Jaspers
Reply to  Michel
17 hours ago

I am completely aghast and dumbstruck. I wondered for a moment whether this was an April fool’s joke, albeit, a bit late for that. How could anyone possibly result to such a debasement of the integrity of the peer review process by having a so called “AI” (though I question the ‘intelligence’ part of the “I”) review a paper. This is an absolute defilement of intellectual integrity and a shameful state of affairs that we are discussing this here today. I can understand an outsider to philosophy or to academia (the business type, say) looking for a ‘time saver’, as it were, but for a philosopher to look themselves in the mirror after defiling the peer review process in this way is beyond me.

Mike Gregory
Mike Gregory
Reply to  Paul Jaspers
17 hours ago

Sure, but it seems like this moralized take on peer review misses that the majority of reviewers are overworked academics with almost no incentive to write reviews. The amount of uncompensated labor that we are asked to do as academics is insane and we should not be suprised that people are refusing to spend time doing it.

Also, this response assumes that human reviewers are putting thoughful time into producing reviews focussed entirely in truth-seeking through collective inquiry. This is obviously false. Many reveiwers barely read the paper and are often motivated by their own gatekeeping biases that trade more in advancing their conception of what philosophy papers should look like. It is unclear that the process is some sacred practice as you describe it.

We miss the point, and risk strengthening labor inequality in academia, when we pretend that peer review is some holy institution.

Jessie Ewesmont
Jessie Ewesmont
Reply to  Mike Gregory
17 hours ago

I agree reviews are uncompensated labor. If you think it is too much work, you should just say no rather than using an AI. I also agree some reviewers do a sloppy job even if they use AI, and that peer review is not a sacred and unimpeachable process. Still, it strikes me that the flaws of peer review mean we ought to try to make it better in whatever ways we can – by not using AI, and doing our best to set our bias to the side – instead of treating it unseriously. As flawed as it may be, it still determines who gets published, and therefore who gets hired and who gets tenure. These are weighty stakes, so we should do our best as peer reviewers.

Michel
Reply to  Mike Gregory
16 hours ago

Peers are wrong all the time. Peers do a bad job all the time. But they are, at least, peers. A chatbot is not a peer.

I teach 8-11 courses a year. I also referee 20+ items a year. I’m not superhuman, and neither is the effort involved. Not everyone need do so much; but if someone hasn’t got the time, then a simple ‘no’ is preferable to offloading the task onto an RA, a graduate student, an undergraduate, a random stranger, or a chatbot.

Last edited 16 hours ago by Michel
Chicago prof
Chicago prof
Reply to  Mike Gregory
13 hours ago

It should be obvious that peer review is part of the suite of professional duties that make up an academic’s professional life, a part of the system that generates our salaries. Treating it as uncompensated labor is among the delusions of those academics that think of themselves as part of the working, rather than the ruling class. Sure, if you’re an adjunct, no incentive to review. But also no expectation that you will.

R.R. Urmason
R.R. Urmason
17 hours ago

Here is a sound argument. There are two premises. One states an obligation; the other is an empirical claim. The conclusion is that in a frequent subset of cases where one agrees to peer review an article, one ought to employ an AI.

Premise 1 (The Obligation Principle): In every case where one agrees to peer review an article, one ought to employ the most reliable means available for identifying errors in that article.
Premise 2 (The Empirical Claim): In a frequent subset of cases, employing an AI constitutes the most reliable means available for identifying errors in an article.
Conclusion: Therefore, in a frequent subset of cases where one agrees to peer review an article, one ought to employ an AI.

R.R. Urmason
R.R. Urmason
Reply to  Justin Weinberg
16 hours ago

Fair enough Justin. I have revised accordingly. Can we agree that this revised argument is now sound?

Urmason’s Argument (Revised):
Premise 1* (The Obligation Principle): In every case where one agrees to peer review an article, one ought to employ the most reliable journal-compliant means available for identifying errors in that article.
Premise 2* (The Empirical Claim): In a subset of cases, employing an AI constitutes the most reliable journal-compliant means available for identifying errors in an article.
Conclusion*: Therefore, in a subset of cases where one agrees to peer review an article, one ought to employ an AI.

Brian Weatherson
Brian Weatherson
Reply to  R.R. Urmason
16 hours ago

This still feels like it doesn’t address the policy question. Compare this argument

  1. In some cases, the best outcome will come from driving as fast as you can without crashing (e.g., if there is a medical emergency, and you need to get a passenger to a hospital, or if you’re a really safe driver, and doing 90mph means you’ll be there for first pitch of your kid’s game).
  2. Therefore, there should not be a policy of banning people driving as fast as they safely can; we should have no speed limits just the autobahn policy of “Be Safe”.

That’s clearly a bad argument. The right *policy* might tell people to do something suboptimal in some cases, because if we set the policy for unusual cases like this, too many people who should be following the sensible policy would not.

It’s at least coherent to believe

A. Some reviews would be improved by AI. (As you say below, that’s not plausible if AI use is mindless cut-and-pasting, but it is a bit more plausible if it involves using the AI as a double check.)
B. If journals allowed AI, the average review quality would go down (perhaps because too many people would slip from ‘double check’ to ‘just write it for me’.)

That’s one reason the policy question is central.

(The confidentiality question is separate, but I kind of think that’s orthogonal to the AI question. Using cloud storage to move a review file between devices raises confidentiality risks, using an on-device AI does not.)

R.R. Urmason
R.R. Urmason
Reply to  Brian Weatherson
15 hours ago

Thanks Brian. So that’s a fair enough point about the distinction between individual optimal actions and overarching policy, but I think my revised argument actually already accounts for this. By specifying that the means must be ‘journal-compliant’ in Premise 1, the argument defers to the journal’s policy. So if a journal implements a blanket ban on AI (like your speed limit), then AI is no longer a ‘journal-compliant means,’ and the obligation to use it dissolves. However, if we are pivoting to debate what that journal policy should be, I think the speed limit analogy at that point breaks down. A better analogy I think is banning calculators in advanced engineering exams because some students might rely on them instead of thinking. If a reviewer is lazy enough to mindlessly copy-paste an AI output, they were likely going to write a poor, perfunctory review anyway. Editors already serve as a quality-control backstop against bad reviews. Implementing a blanket, unenforceable ban on AI doesn’t stop lazy reviewers but really just serves to prohibit conscientious reviewers from using a powerful diagnostic ‘double check’ to catch errors they might otherwise miss, and this is something that will ultimately harm the peer review process.

Jessie Ewesmont
Jessie Ewesmont
Reply to  R.R. Urmason
17 hours ago

Both premises are wrong. Premise 2 is wrong because of the well known propensity of AI to hallucinate, especially relative to the well known propensity of reviewers to pick apart even subtle errors in a manuscript. (That’s relevant, since “most reliable means” is comparative.)

The first premise is also wrong. Suppose I am in a department with many other intelligent and astute senior philosophers. It might be that they’re more skilled at identifying errors than me. But it wouldn’t be permissible, let alone obligatory, to ask them to give comments on the manuscript that I simply copy and paste in my review. The editor has asked me to do the review, not them.

R.R. Urmason
R.R. Urmason
Reply to  Jessie Ewesmont
16 hours ago

I think both of these critiques rest on treating AI as an autonomous substitute rather than an assistive tool. Regarding the second premise, it is a false dichotomy to pit human expertise against AI hallucinations; the “most reliable means” is not human or AI, but human plus AI. While AI can hallucinate, human reviewers are subject to fatigue and cognitive bias, so using an LLM to flag potential issues for the human to independently verify effectively mitigates the weaknesses of both. Regarding the first premise, the analogy of passing a paper to a senior colleague fails because it conflates a moral agent with a non-sentient tool. Giving a manuscript to a colleague violates confidentiality (obviously) and entirely outsources the intellectual judgment the editor explicitly asked of you. Conversely, using an AI as a diagnostic aid is functionally no different than using a calculator to check a paper’s math or a search engine to check for plagiarism; you are still applying your own final, synthesized judgment in such a way as to be fulfilling your obligation to the editor while utilizing the best available tools to do so.

Kenny Easwaran
Reply to  Jessie Ewesmont
15 hours ago

I think something much stronger than Premise 2 is actually true. In some subset of cases, a blanket policy of saying “there are no errors” may constitute the most reliable available method for identifying errors in a manuscript. These might also be cases in which a blanket policy of saying “there is a serious error on page 7 that makes this article unpublishable) may also constitute an equally most reliable available method for identifying errors in a manuscript.

These are, of course, cases in which a person has overcommitted themself and isn’t going to be able to do anything at all reliable. These are not good cases, but we know they exist. In these cases, despite the frequency of hallucinations, an LLM is likely to do better than either of these policies.

But I think the failure of Premise 1 still means that one shouldn’t use an LLM in these cases. (At least, certainly not one run on the cloud from a for-profit company.)

Michel
Reply to  R.R. Urmason
16 hours ago

Looks valid to me. But sound? I, for one, am not sure either premise is true.

Sam Duncan
Sam Duncan
Reply to  R.R. Urmason
16 hours ago

Meanwhile in the real world one of the main tells for when my students use AI on one assignment is that the robot goes on and on about how important fairness and integrity are for act utilitarians. And on another it loves to bullshit about how important the virtues of patience and authenticity were for Aristotle. Now feel free to tell me how this doesn’t happen with the $500 dollar version that I’m never going to buy and so I’m entirely unsuited to impugn the glory and wisdom of Claud, Chat GPT, or whatever.

marketeer
marketeer
Reply to  R.R. Urmason
15 hours ago

I think there are several problems with premise 1, but I want to focus on premise 2. This premise doesn’t align with my experience at all, albeit I don’t use AI much so I have limited data to work from.

Just as one piece of anecdata, I was curious about six weeks ago how good LLM systems have gotten at philosophy. I fed a rough, ~5000-word first draft of a paper to mine to Claude Opus 4.6 (with “thinking” enabled). I knew I needed to add a whole section positioning my paper within a major literature I hadn’t addressed at all, and that was the main weakness of the draft. I asked Opus 4.6 to evaluate the draft. It missed that problem entirely, raised a couple of trivial worries (one of which was based on a misunderstanding), and then declared the paper already publishable pending minor revisions. It was, effectively, useless. Since then, I’ve realized it also failed to point out that an inference I drew in handling an objection didn’t follow without a questionable assumption about which there’s a sizable literature, which I learned only after sending the paper to a peer for feedback. I’ve had to majorly rethink the paper accordingly.

Suffice to say, I wasn’t impressed with Opus 4.6.

Of course, this isn’t proof that there does not exist a sizable set of cases in which Claude or some other LLM provides more reliable feedback than any other means. I’m just flagging doubt about it.

R.R. Urmason
R.R. Urmason
Reply to  marketeer
15 hours ago

right, so your experience with Claude isn’t something I’m going to dispute — but I think it actually highlights a misunderstanding of how Premise 2 works in practice. You tested the AI as an autonomous senior colleague by giving it an open-ended prompt (‘evaluate this draft’). Current LLMs are undeniably not that great yet at macro-level philosophical positioning (though Opus 4.7 is much better as is GPT 5.5) and often default to polite, surface-level nitpicks when given a zero-shot prompt like the one you gave. However, notice that Premise 2 doesn’t claim AI is the best means for every type of error detection; it just claims it is the best means in a subset of cases. The ‘most reliable means’ of employing AI isn’t asking it to review the paper entirely, but using it as a targeted diagnostic tool. So if you instead prompt an LLM to execute narrow, specific tasks (say eg like ‘extract the premises in section 3 and check for formal validity,’ or ‘read my defense against Objection X and generate the strongest possible counter-argument’, it frequently catches subtle and tricky stuff that human fatigue might miss. The point here being that the failure of an LLM to independently identify a missing literature gap doesn’t disprove its unmatched reliability when properly directed at the subset of tasks it actually excels at

Daniel Weltman
Reply to  R.R. Urmason
6 hours ago

Do you have an example (or multiple examples) of an LLM catching subtle and tricky stuff that human fatigue might miss? Like others I have not yet found good use cases for LLMs but I am open to the possibility I am using them wrong. Actual examples would help me evaluate claims like yours.

Kenny Easwaran
Reply to  marketeer
15 hours ago

I have tried similar things with a few of my papers. So far I also haven’t gotten good results. Maybe there are better sets of instructions that would make it work better. But someone who is turning to an LLM to save time with their refereeing probably isn’t spending the time to carefully craft a usable set of instructions for an LLM to effectively review a paper.

It’s much more plausible though that someone could jot down a bunch of notes while reading a paper, and an overall verdict, and then usefully have an LLM turn that into the paragraphs that an author can usefully respond to.

P.D.
Reply to  R.R. Urmason
15 hours ago

Premise 1 is false. When I’m asked to do peer review, I’m consulted as an expert. If it’s best to use some method other than my own expertise, then I ought to decline to do peer review and recommend that the editor do the other thing instead. For example, I might decide that I really don’t know anything about this topic but I know someone who does. In that case, I’d recommend a different reviewer. It would be wrong for me to take the assignment and that farm out the work to my more qualified colleague— or, at least, wrong to do so without OKing it with the editor beforehand.

R.R. Urmason
R.R. Urmason
Reply to  P.D.
15 hours ago

Interesting line.. so you are absolutely right that an editor is consulting you for your specific expertise and it would be unethical to farm that out to a colleague even if they’re a smart one. But I think this objection conflates outsourcing your expertise with augmenting it. So when you say, ‘If it’s best to use some method other than my own expertise, I ought to decline,’ you are assuming that using an AI replaces your expert judgment. But the ‘most reliable means’ advocated in Premise 1 is actually ‘Expertise + AI,’ not AI alone. Consider an analogy here, so if a paper relies on a complex formal logic proof or statistical data, etc you wouldn’t decline the review simply because logic-checking software or a calculator is a ‘method’ more reliable at math than your unassisted brain — you’d use the tool to verify the mechanics, and then apply your philosophical expertise to evaluate the implications. My take here is that using an LLM as a diagnostic tool (to check for structural consistency etc) is functionally identical. You aren’t farming out the work to an agent; you are using the best available tools to ensure your expert evaluation is as rigorous as possible, which is exactly what the editor is hoping you will do

Daniel Weltman
17 hours ago

I’ve received one referee report that I’m 95% sure was AI generated. Honestly I’m just happy it has only been one so far…

Jessie Ewesmont
Jessie Ewesmont
17 hours ago

I would rather get two flat rejections from human reviewers than two AI-generated unconditional acceptances.

R.R. Urmason
R.R. Urmason
Reply to  Jessie Ewesmont
16 hours ago

We’d better let the Philosophical Review know about this preference promptly.

Dr. M, an adjunct
Dr. M, an adjunct
Reply to  R.R. Urmason
12 hours ago

R R Urmason,

This rather nasty response to someone’s statement is uncalled for and, to my mind, downright rude. Have you any basis for this sarcastic implied-assessment of the merits of the prior commenter’s work?

BCB
BCB
Reply to  Dr. M, an adjunct
11 hours ago

I thought the joke was that most people’s experience submitting to Phil Review consists in getting two flat rejections from human reviewers.

R.R. Urmason
R.R. Urmason
Reply to  Dr. M, an adjunct
10 hours ago

Lighten up, Adjunct, it’s just a bit of banter

send through the comments
send through the comments
16 hours ago

Imagine the possibility where journals don’t send through the comments when rejecting papers.

Mimo
Mimo
16 hours ago

I agree completely with the conclusion that reviewers should not use AI and should not paste into chatbots the manuscripts they’ve been asked to review. A really nit-picky point regarding #1: it seems worth clarifying that you can easily opt out of having your conversations used as training data. E.g., in Claude, click on your name and then “Privacy” and then uncheck “Help Improve Claude.”

Also, the AI detectors are extremely unreliable. I treat them as providing practically no information. I fed in a manuscript of mine (which of course I did not use AI to write at all) and several detectors were extremely confident that it was AI-generated (think 80-90%). So although I believe that some reviewers in philosophy have used chatbots by now, the evidence cited (based on AI detectors) seems to me very weak.

Kenny Easwaran
Reply to  Mimo
15 hours ago

One other related nitpicky point – there are various open-source LLMs that people can run locally (Mistral, Gemma, LLaMa, Qwen, Kimi, etc.) and if you run it completely locally, everything definitely stays private. Also, many universities have set up similar internal systems, sometimes even using versions of relatively high-end models from Anthropic, OpenAI, and Google, that again avoid the privacy/data security objection.

They still aren’t going to do a great job if you give them the job as a whole, but they might be usable for turning notes for a review into an actual review, or for providing a second opinion about whether there are any relevant points you missed while reviewing.

Thinkmaxxing
Thinkmaxxing
Reply to  Mimo
15 hours ago

Yeah, it’s actually alarming how many colleagues and admin think these are reliable tools.

Johannes Himmelreich
Johannes Himmelreich
Reply to  Mimo
11 hours ago

It is never a good sign, if you take it as an indicator of the expertise with which the topic is approached, that the assumption that anything that’s processed by a model becomes its training data is made unquestionably.

Daniel Weltman
Reply to  Johannes Himmelreich
6 hours ago

If you trust these companies to do what they say they are going to do, so much so that you are willing to upload someone’s unpublished work which you are very much not supposed to share with anyone, then I think you have undue trust in these companies. Specifically, you are trusting their security procedures (even though every day it seems like people are using LLMs of all things to poke holes in companies all over the world!), their technical acumen (are they really not hanging on to the data? Or if they are, are they really hiving it off from their training?), and their honesty and commitment to stick to promises (don’t even get me started…).

Reviewer Reviewer
Reviewer Reviewer
15 hours ago

I have never received a review that I felt was written by an LLM, but there was one time when I served as a reviewer and was sent the other reviews the paper received. One of them was unambiguously written by an LLM. The review had literally nothing to do with the paper. It was about a completely different topic, but appeared to have been generated based on just the title of the paper. It was so strange and definitely very disheartening.

Conner Schultz
Conner Schultz
14 hours ago

I hope that everyone will agree that this is wrong, but even if most everyone does, this problem is only just beginning. As many others have said before, we’re careening toward a vicious cycle: referees are already overworked, and given the competitiveness of the discipline and the imperative to publish or perish, it’s only a matter of time before AI-driven research output increases dramatically, thus overworking referees even more. I don’t know the solution, but I doubt that clear policies around AI will help unless the whole incentive structure changes.

Caligula's Goat
Caligula's Goat
Reply to  Conner Schultz
14 hours ago

Consider this a “modest proposal:” reputable journals should get together and form a loose consortium and agree to share both author and reviewer data. Any author or reviewer suspected of using AI to write either their papers or reviews *in a way that violates the journal’s policy* should be given an opportunity to explain their use. If, by majority vote of a journal’s editors, the author is found to have violated their AI policy, then they should be banned from submitting any material to any of the journals in the consortium.

This would have the secondary benefit of more clearly marking predatory/scam journals.

Conner Schultz
Conner Schultz
Reply to  Caligula's Goat
11 hours ago

That would change part of the incentive structure, so it could work in that respect. But there’d be a risk of false positives with disastrous consequences for the author, and just as clever students quickly learned how to cover their tracks, so too would authors and referees.

Caligula's Goat
Caligula's Goat
Reply to  Conner Schultz
8 hours ago

If an author (and we’re talking about people either in PhD programs or people with PhDs and year – if not decades – of experience) can’t convince a majority of the editors of a journal that they actually wrote something then maybe we kill two birds with one stone here and we help to reduce the overall number of bad submissions (human and AI).

On the other hand, our solutions have to be responsive to changing conditions so if we get to the point where human and AI submissions are indistinguishable then, as I’ve suggested in other posts before, maybe it’s time for us to reconsider “the paper” as the best measure of philosophical activity. I’ve suggested that “the talk” might return to prominence in such an environment and, to be honest, I think it would be better for the discipline if talks were given more weight than they are now. Maybe it’s win-win.

R.R. Urmason
R.R. Urmason
Reply to  Caligula's Goat
7 hours ago

So a weakness in this proposal is that it makes it rational to never review another paper, as doing so comes with a non-negligible chance of never being able to submit a paper again; this bad result could arise, following a given reviewing task you perform, via either a triggering a true positive (if you use AI to review and they catch it) or a false positive (if you don’t but they thought you did) and where banishment from submission is a known possible consequence of either.

Daniel Weltman
Reply to  R.R. Urmason
5 hours ago

It would be rational according to a narrow sense of ‘rational,’ according to which it’s already rational for most people never to review another paper (because you get no real benefit for reviewing, or at least no benefit that outweighs the cost of reviewing). Practically nobody in the profession is this kind of ‘rational,’ so your point is neither here nor there.

If you think that the proposal would in fact cause too many people to stop reviewing, you can say that, but it’s needlessly obfuscatory to make this out to be a point about rationality. Anything is rational in this sense, given certain preferences. The question is what preferences people in fact have.

It’s not clear to me that many people have preferences such that the proposal would change things. Given the increase in reviewers needed due to the onslaught of AI-written papers (I’ve reviewed one already, and lord knows how many editors have caught before sending them out for review) I think there’s a good case to be made that we’d get more genuine papers reviewed if AI users were cut out from the system than if we don’t do anything. It at least seems to me to be an open question.

Chad Gupta
Chad Gupta
13 hours ago

Nothing says rigorous peer review like a referee pasting your paper into an AI and waiting for wisdom to rain down. Suddenly every report has the same tone: polite, vaguely impressed, and deeply confused about your equations. “Consider expanding the discussion” has never felt so algorithmic. I imagine a secret Slack where referees compare prompts instead of insights. Somewhere, two chatbots are debating your methodology while their humans sip coffee. Honestly, if the AI accepts the paper, I’m tempted to cite it as Reviewer 3. At least it responds faster and doesn’t hold grudges from conferences in 2014, either way.

A proposed solution
A proposed solution
13 hours ago

Here’s my solution. Reviewers have to be open about AI usage. Authors are then told about the AI usage and are given a chance to respond.

Then the author can just say, “Actually, you got this completely wrong.”

And the AI will be like, “What an astute observation! You are absolutely correct. Your paper should be published immediately!”

Scott Forschler
Scott Forschler
5 hours ago

While my first reaction was certain outrage, I quickly remembered that I have had a large number of presumably human “reviewers” hallucinate and attribute to my papers outrageous claims which I never made, rejecting my actual document on the basis of a straw man which they invented. And I have sometimes gotten better feedback from chatgbt about, e.g., which authors to read to supplement a budding argument sketch. I think that there may be *some* journals which might benefit from having AI feedback, which might be somewhat more objective and neutral than some human reviewers. They certainly shouldn’t *replace* human reviewers. But maybe should supplement them. Perhaps even assist editors who can get that feedback first to simply assess whether this is a new idea or a recycled/badly written one. A human should of course *always* double-check this and see if the ai review passes an initial smell/sense test. If the AI agrees with the human editor/reviewers, there you go; if it disagrees, time for a second look.

This doubtless says more about my disgust with some of the absurd things that human reviewers have said about papers which they clearly didn’t read carefully, or approached with great bias, than about my faith in AI.

Last edited 5 hours ago by Scott Forschler
Real Ref
Real Ref
4 hours ago

To answer the first question, yes, I believe this did happen to me, only not as author but as the other reviewer. I reviewed a piece some months ago — maybe it was that of the PhD student you’re talking about — and at some stage I was able to see my fellow referee’s report. To me, it was pretty clearly written by AI. It had generalized high praise, coupled with some unusually detailed (if easily executed) suggestions for revision, and lots of gratuitous summaries of the paper, of a style I’ve never seen in reviews before. I won’t go into all the signs, but I was pretty sure. I didn’t share my concerns with the journal (an extremely highly regarded moral philosophy journal), but I discussed them in the Cocoon.

As it happens, I’m now reviewing a paper where I believe the exact same thing happened, with a fellow ref’s report that is strikingly similar to the other one, as though a common template was used! That only increases my suspicion about both reports, obviously.

My sense is that the other reviewer/s did not wholly hand over their assessment of the manuscript to some LLM. Rather, they read the manuscript quickly and superficially, judged it to be of publishable quality, and then let AI handle the nitty gritty of generating a review, one which they probably requested be both decisively favoring publication, engaging with the details, and recommending very precise, targeted changes it would be easy to implement.

I’m thinking of raising it with the journal editors, but I’m hesitant to smear a colleague with the insinuation that they “cheated” in the review — having only the indirect evidence of the referee report (damning as it is, in my view).

Leo
Leo
49 minutes ago

I’m a non-native speaker and I use AI to proof-read every report that I write. Perhaps, some of reports “sound like AI” but the content is from real philosophers.