AI & the Professions


Large language models outperform humans as junior lawyers, according to a new study.

[image created with DALL-E]

Authors Lauren Martin, Nick Whitehouse, Stephanie Yiu, Lizzie Catterson, and Rivindu Perera of Onit Inc., in New Zealand write, in “Better Call GPT, Comparing Large Language Models Against Lawyers“:

We dissect whether LLMs can outperform humans in accuracy, speed, and cost-efficiency during contract review. Our empirical analysis benchmarks LLMs against a ground truth set by Senior Lawyers, uncovering that advanced models match or exceed human accuracy in determining legal issues. In speed, LLMs complete reviews in mere seconds, eclipsing the hours required by their human counterparts. Cost-wise, LLMs operate at a fraction of the price, offering a staggering 99.97 percent reduction in cost over traditional methods. These results are not just statistics—they signal a seismic shift in legal practice. LLMs stand poised to disrupt the legal industry, enhancing accessibility and efficiency of legal services. Our research asserts that the era of LLM dominance in legal contract review is upon us, challenging the status quo and calling for a reimagined future of legal workflows.

Onit is a corporation that is in the business of selling AI legal services, so maybe take this study with a grain of salt.

Still, the future is on its way. Anyone know of any studies about the effectiveness of LLMs replacing teachers yet?

(via Marginal Revolution)

Horizons Sustainable Financial Services
Subscribe
Notify of
guest

14 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Sam Dumcan
Sam Dumcan
2 months ago

The headline strikes me as misleading. After all, this isn’t the only thing junior lawyers do and I’ve always heard that contract review is an incredibly basic thing. In fact, I’ve seen more than a few people argue that paralegals with non-degree certificates are quite competent to do the vast majority basic contract review. However, the legal profession functions as a guild so lawyers, who are wildly overqualified do the work and do it for a higher fee than paralegals would, end up doing it. I guess the general public might start using LLM’s instead of paying a few hundred bucks for lawyers but I doubt that corporations will. After all, saying that you’ve had a lawyer read the contract gives you a level of butt covering that “we ran it through Lawbot 9000” doesn’t.
Anyway unless I’m wrong about how hard basic contract review is (and lawyers who read this should feel very free to correct me) this is like saying that Dr. Scantron is as good as a junior level math professor if not better because it can grade remedial math test just as well as he could and quicker. That doesn’t mean that Dr. Scantron could say follow a proof or spot errors in one much less write one itself.

Graham Clay
Graham Clay
Reply to  Sam Dumcan
2 months ago

If this take is right, and all the basic tasks lawyers do each day take a significant amount of their time (or paralegals’ time), then it would seem that significant productivity gains would come from LLMs being able to complete these tasks satisfactorily, even if the tasks are merely basic ones. This lowers costs, increasing accessibility and efficiency, even supposing only lawyers (or paralegals) continue to complete these tasks (and not the public more broadly). It’s just like how ChatGPT (or Copilot) can do basic drafting, outlining, slide-generation, writing, etc. tasks very well, enabling skilled humans (who can easily do those tasks, of course) to move more quickly to more advanced tasks.

Another dimension is that not all lawyers, paralegals, etc. are on a par in terms of skill. Many studies show that most workers who use AI save time using it, but less skilled workers gain more from AI assistance than the more skilled ones. This means that teams of workers using AI had more “skill equality” than those teams who did not use it. Here’s a recent study on this front that focuses on law students specifically: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4626276. From the abstract:

“access to GPT-4 only slightly and inconsistently improved the quality of participants’ legal analysis but induced large and consistent increases in speed. AI assistance improved the quality of output unevenly—where it was useful at all, the lowest-skilled participants saw the largest improvements. On the other hand, AI assistance saved participants roughly the same amount of time regardless of their baseline speed.”

Caligula's Goat
2 months ago

Long ago, in one of the early ChatGPT/LLM series posts that Justin put up, I half-jokingly commented that we should be able to train an LLM on the feedback we give our students so that we can have ChatGTAs do all of our grading and commenting for us. I now here propose that more seriously.

Grading and commenting on work is, like contract review, a tedious and repetitive job -at least at the undergraduate level. Despite this, grading and commenting on student work takes up maybe 50% of my teaching time (if not more). Many of us have either chosen, or been forced, to use online course management software to collect and assess student work so, in essence, we have already created a large library of material to train these LLMs on. Assuming that these LLMs could give comments on undergraduate work that is as good as an average instructor, what added value is there to having an instructor actually provide that work?

In my view, so long as the instructor has to read through and approve of ChatGTA’s comments and grading I genuinely don’t see how this is any different, morally, than an instructor having graduate student TAs do all of the grading and commenting in their classes for them. In fact, because most professors with TAs don’t actually read through the comments their TAs give on student work this approach might even be better than the standard TA approach. The only real drawback I can see is that TAs themselves will miss out on the opportunity to develop skills related to commenting and grading on undergraduate work themselves but the most elite R1s never cared to develop this capacity anyway (I’ve interviewed many job candidates from these elite programs who have never been a TA or taught a class themselves while they were graduate students).

Graduate work, and graduate commenting, might be relevantly different from undergraduate work that it deserves actual human-generated feedback. However the very necessary sorts of undergraduate assignments that ask students to carefully reconstruct historical arguments or to come up with reasons for or against something? The bread and butter sorts of undergraduate work? I think we could probably offload those to ChatGTA. The benefits are many and obvious in terms of freeing up our time to engage in more research, service, or living a more balanced life. We would always still have professional duties to our students who want more hands-on feedback but what a joy it would be if I didn’t have to lose so many weekends of my life to grading.

Graham Clay
Graham Clay
Reply to  Caligula's Goat
2 months ago

Have no fear, there are many, many companies that have products that use LLMs to complete the grading and feedback tasks you describe. And if you don’t like them and their idiosyncracies, you can create your own by either using Microsoft 365 Copilot or Google Duet AI (keeping the files within your school’s ecosystem) or by using Zapier/Make to build your own custom integrations (which requires more care with respect to privacy and student data). I have done several versions of the latter for professors, of varying complexity. The only practical obstacles preventing a professor from doing this–in some form or another, whether more limited or more permissive, depending on your views on the ethical issues at play–are tech know-how and upfront time, not tech capabilities.

Marc Champagne
Reply to  Caligula's Goat
2 months ago

They pretend to write and we pretend to grade them — with Silicon Valley making a profit. This is progress? Count me out.

Caligula's Goat
Reply to  Marc Champagne
2 months ago

Marc, no joke, how is it different, from the professor’s point of view, to have an LLM do your commenting and grading (but not your in-class teaching) instead of a flesh and blood TA?

Back when I was in grad school and a TA, professors might meet with us once or twice a term to make sure things were going well and maybe to hold a norming session for grading where they might read four exams. Lectures were their primary teaching work.

I’m actually trying to take us <put on sunglasses> “Back to the Future” with this approach.

Michel
Michel
Reply to  Caligula's Goat
2 months ago

Well, TAs are people with qualifications and they actually read the assignment before marking it.

Also, some of us don’t have TAs.

Caligula's Goat
Reply to  Michel
2 months ago

But that’s the beauty of it. Why should only some professors have the privilege of giving only lectures while someone else does all of the grunt work of grading and commenting? We can all have it with ChatGTA. The LLM is trained on your data, your history of commenting both in terms of content and in the sorts of places where you make those comments.

Remember, like a good CEO or Professor with TAs, you are ultimately responsible for the work of your subordinates. You should make sure that your LLM is commenting and grading consistent with your standards because you’re the one accountable. If this cuts down my grading load even by 30-50% it would be incredibly freeing and, to my mind, with no real loss in quality.

Nick
Nick
Reply to  Caligula's Goat
2 months ago

so long as the instructor has to read through and approve of ChatGTA’s comments and grading I genuinely don’t see how this is any different, morally, than an instructor having graduate student TAs do all of the grading and commenting in their classes for them. 

First, there is the obvious deontological consideration, which I’m surprised to see missed here: the students did not pay $20-$80k in tutition under the understanding that their papers would be graded by a machine. There is an obvious violation of trust here, and this practice would require full transparency and consent before it could even begin to pass moral muster.

But that aside, I I want to highlight this claim because I think it (unfortunately) contains the same tunnel-vision that will collectively lead us to a place where we cannot resist the total takeover of the university by AI.

We have to stop defaulting to the “can this one task be done the same way?” question. A narrow focus on the competent completion of the task itself–grading–will always miss the larger structural questions. For example: why are there graduate students at all? If many of their core tasks (research assistantships, grading, etc) are done by AI, then we reduce the demand for them in the system, and admins have ever more reason to get rid of them.

This point generalizes: give it 10, 25, 50 years and AI will be able to efficiently perform any social/intellectual task you can dream of. If our policy is to ask, at each step, “can this task be done efficiently by AI?” then we have collectively decided to acquiesce in the annihilation of our own way of life. This cannot be the only question we ask.

Robert
Robert
2 months ago

You say the future is on its way as if it’s inevitable. But this isn’t. It’s the result of corporations and bosses divorcing humans from their mental labor. It doesn’t have to be this way.

Zac Cogley
2 months ago

Here’s a related piece that agrees that there are potential implications for some aspects of legal work but also that there are important areas that we have reason to think are pretty resilient:
https://www.aisnakeoil.com/p/will-ai-transform-law

I encourage reading the full study and especially the appendix (p 12 onward) which talks about the kinds of things the LLMs tended to miss (the importance of subtle bits of language, which the Senior Lawyers tended to pick up on). In addition, the paper includes the System Prompt used for the evaluations.

I have quibbles with the way the results are reported that arguably impact estimates of a “seismic impact.”

The construction of the system prompt is non trivial, but appears note to be factored into the time/cost estimates. In addition, each contract is also delivered to the LLM with a “document scenario” that “provides background
context on both parties involved in the agreement including the size, industry, and product/service of the parties and any specific areas of concern in the contract.” And, each contract is delivered to the llm with a “checklist” that ” lists the items [the client deems] important in the contract.”

The document scenario and checklist would presumably still be produced by human lawyers. I think what this suggests is that contract review may change significantly and that the role(s) that lawyers play may be quite different in the future. But this study, at least, doesn’t mean that firms can just stop hiring junior lawyers.

Derek Bowman
Derek Bowman
2 months ago

“Onit is a corporation that is in the business of selling AI legal services, so maybe take this study with a grain of salt.
Still, the future is on its way.”

This just seems like a recipe for being taken in by, and amplifying, industry hype. “A presumptively unreliable source says the product can do X, so surely it will soon be able to do X+n.”

I suppose this is a slight improvement over the usual inference from Hollywood movie scripts, but it would be nice to see philosophers modeling the kind of critical thinking we claim to teach our students.

naive skeptic
naive skeptic
2 months ago

To answer the question, if you search for Khanmigo on Google Scholar there are a handful of articles on LLMs in education. None that I saw are robust generalizable empirical studies, I don’t think there’s been enough time for those to be created and published yet.

For me, the greater challenge has long been: how do we get people to meaningfully and thoughtfully engage with the possibility that the future may be radically different than the present?

We of course don’t know what the future will be like, but at this point it is at least possible that the future may be radically different than the present. Some people, some otherwise very thoughtful and respectable people, are completely unwilling to even engage in that thought. What should we do about this? What can we do about this?

Last edited 2 months ago by naive skeptic
M Dentith
M Dentith
2 months ago

The thing that often seems to be missing in these conversations about LLMs doing a task x at an increased speed y (compared to a puny human) is the power/environmental cost. Yes, LLMs can do some of these tasks quickly and efficiently, but they are wildly expensive in terms of power usage. We’re already seeing a demand to return to more environmentally destructive power generation to power the servers the current generation of LLMs run on. As Silicon Valley says even more of tasks like these can be done by LLMs, there will be even more demand for additional power.

I realise this is a slippery slope argument, but not all such arguments are fallacious, and whilst people like Sam Altman says we should expect newer generations of LLMs to be more power efficient, it’s not clear that these efficiencies will a) materalise, and b) whether those efficiencies will mean much if LLMs begin to be even more widely employed.