Class Action Lawsuit against AI Firm for Copyright Infringement


If versions of any of your books are on LibGen or similar online collections of pirated material, there is a chance it was used as training data for AI, and you may be able to join a lawsuit about it.

Earlier this month, a group of authors sued Anthropic, the firm behind the Claude family of large language models, for copyright infringement.

The lawsuit states:

Anthropic has built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books. Rather than obtaining permission and paying a fair price for the creations it exploits, Anthropic pirated them. Authors spend years conceiving, writing, and pursuing publication of their copyrighted material. The United States Constitution recognizes the fundamental principle that creators deserve compensation for their work. Yet Anthropic ignored copyright protections. An essential component of Anthropic’s business model—and its flagship “Claude” family of large language models (or “LLMs”)—is the largescale theft of copyrighted works…

Anthropic has not even attempted to compensate Plaintiffs for the use of their material. In fact, Anthropic has taken multiple steps to hide the full extent of its copyright theft. Copyright law prohibits what Anthropic has done here: downloading and copying hundreds of thousands of copyrighted books taken from pirated and illegal websites.

The group of authors includes writers of fiction and non-fiction. They want the court to require Anthropic to pay damages and to prevent the company from using pirated material in the future.

The attorneys for the plaintiff are from the firm Lieff, Cabraser, Heimann & Bernstein. If you believe your work was among the pirated material scraped by Anthropic, you can provide the firm with your contact information here.

The full text of the lawsuit is below (and also here).

(via Jennifer Mensch)

guest

10 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
cecil burrow
cecil burrow
8 months ago

Surely the benefits of libgen still outweigh the negatives.

Jennifer Mensch
Jennifer Mensch
Reply to  cecil burrow
8 months ago

I think there is a difference between people who use the sites to support their teaching and research and companies like Anthropic that use the sites to create products which, in our case, are oftentimes being used to undermine teaching and learning. Anyway, the suit is focused on the harvesting of the data I think, not the sites from which the data was taken.

Jennifer Mensch
Jennifer Mensch
8 months ago

You can search for your name here; the list includes articles in addition to books:

https://www.theatlantic.com/technology/archive/2025/03/search-libgen-data-set/682094/

Michel
8 months ago

On the other hand, if your book went through a publisher owned by T&F (e.g. Routledge), you’re out of luck, since they agreed to sell your work to OpenAI (despite paying you nothing for it).

V. Alan White
8 months ago

Almost everything I’ve published is in there, a dozen pieces including a book, and I’m a nobody. People ought to look out for this for sure.

Alexander Bird
Alexander Bird
8 months ago

I’m curious about the legal issues here. Clearly (to my mind) Anthropic did wrong by using pirated versions of copyrighted material. Presumably if they had properly bought and paid for every book they used (like you or I would), then that claim could not be made. Is there anything else? There is mention of doing this ‘without permission’. But what exactly is that about? If I buy a book then I am permitted to read it. And I’m permitted to lend it to friends to read and so on. What about giving it to my AI training model to read? Is there something different about that model not being human? There is a sense in which we are all learning models who learn to write and indeed to think in certain ways by reading others’ works. (Mutatis mutandis for music, and so forth.)

Amy
Amy
Reply to  Alexander Bird
8 months ago

I believe there are differences , or so alleged in the NYTimes suit from a few years back.
1 Not sure but I don’t think you’re allowed to make copies of the book or newspaper under copyright law. I think the way these models are trained involves making copies.
2 The industrial scale and profitability allegedly remove it from a ‘fair use’ category that would protect things like lending your books and telling people about them.
3 LLMs sometimes literally reproduce large bits of verbatim text from the source. This is a clear copyright infringement I guess. I had it explained a while back in a nice way. Even if you rote memorise a book it’s illegal for you to stand on a street corner and recite it or make a youtube video of you reciring it, or write it out longhand to paying customers, etc.

Alexander Bird
Alexander Bird
Reply to  Amy
7 months ago

Thank you, Amy, that’s very helpful. Maybe the models could be designed to avoid 3. And perhaps 1 even. Less so 2. (I can imagine that a lawyer for the AI companies might argue that the use of any single text does fall within ‘fair use/dealing’ provisions and the fact that there are many such texts doesn’t make difference—i.e. a lot of fair use doesn’t make for unfair use. But I can see that it’s at best a grey area.) Thanks again.

John Companiotte
John Companiotte
5 months ago

How do I submit a claim for the settlement? One of my books is in libgen, but the book was properly registered for copyright by McGraw-Hill.