Connect with us

AI Research

Courts Rule Artificial Intelligence Training on Books Is Fair Use

Published

on


Kadrey v. Meta: A complete victory for Meta built on evidentiary gaps

Thirteen prominent authors, including Sarah Silverman, Ta‑Nehisi Coates, Rachel Louise Snyder, Junot Díaz, and Pulitzer winner Andrew Sean Greer, sued Meta for copying their books from “shadow libraries” to train its Llama models. After discovery limited to the named plaintiffs’ works, the authors moved for partial summary judgment. Meta cross‑moved.

The court granted Meta’s motion and dismissed the copyright claim. Applying the § 107 fair use factors, the court reasoned that:

  • Factor 1: Purpose and character. Using books to teach an LLM how to model linguistic relationships is “highly transformative,” because the model is deployed to draft emails, translate text, write code, and perform myriad tasks far removed from reading a book for entertainment or study. The commercial motive was acknowledged but did not override the degree of transformation.
  • Factor 2: Nature of the work. The books are quintessentially creative, yet this factor “rarely controls” and carried little weight.
  • Factor 3: Amount copied. Copying entire books was “reasonably necessary” to achieve the transformative purpose. Large, coherent blocks of high‑quality prose make LLMs better at handling long‑context prompts.
  • Factor 4: Market effect. This point was decisive. The plaintiffs offered no admissible evidence that Llama outputs substitute for, or dilute sales of, their books. Tests showed Llama could not reproduce more than 50 tokens from any of the plaintiffs’ texts, even under adversarial prompting. Their second theory – that Meta’s unpaid use destroys a hypothetical “training‑data licensing” market – was rejected as circular because copyright law does not guarantee the right to monetize every conceivable downstream use.

The order binds only the 13 plaintiffs and leaves intact a separate count alleging distribution liability for Meta’s torrenting of the shadow‑library files. The court explicitly cautioned that “this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.”

Bartz v. Anthropic: A split decision that isolates piracy from transformative training

Five authors accused Anthropic of copying millions of books that were purchased, scanned, and pirated to train the Anthropic Claude models. With the court’s permission, Anthropic moved early for summary judgment on fair use. The court divided the challenged conduct into three distinct uses:

  • Training copies (fair use). Feeding tokenized or compressed versions of books into model training is “spectacularly transformative.” Because the plaintiffs alleged no infringing outputs, the purpose, amount, and absence of market substitution compelled judgment for Anthropic.
  • Print‑to‑digital conversions (fair use). Destroying lawfully purchased hard backs and retaining searchable PDFs inside an internal research library is a classic, non‑substitutive format shift. The court found no market harm and little relevance to the creative nature of the works.
  • Pirated “forever” library (not fair use). In contrast, downloading roughly seven million e‑books from LibGen, Pirate Library Mirror, and other sites to build a permanent, general‑purpose corpus was held not to be fair use. The court stressed that the acquisition displaced ordinary sales and that “were the conduct to be condoned as a fair use . . . [a]s Anthropic itself suggested, ‘[t]hat would destroy the [entire] publishing market.’” Anthropic now faces a jury trial limited to damages for its pirated library copies.

Comparative implications

Taken together, the opinions draw three conclusions:

  • LLM training is, no doubt, a transformative use. Both courts ruled that LLM training was highly transformative.
  • Lawful sourcing matters. Bartz shows that pirating source material can override a fair use defense even when the end use is transformative.
  • The fourth fair use factor is the new battleground. Meta prevailed because the authors offered no data tying Llama outputs to lost book sales or licensing value. But plaintiffs who marshal surveys, sales data, or dilution studies could tip the scales next time.

What comes next?

Appeals are virtually certain. The US Court of Appeals for the Ninth Circuit will be asked to decide whether every invisible back‑end copy must clear fair use independently (Bartz) and whether dilution without direct substitution can defeat fair use (Kadrey). Until clearer appellate law arrives, companies should assume that training on text obtained from dubious sources remains a high‑risk activity and that a lack of market‑harm evidence may be only a temporary shield.

Christopher Cyrus also contributed to this article. 



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Research

The new frontier of medical malpractice

Published

on


Although the beginnings of modern artificial intelligence (AI) can be traced
as far back as 1956, modern generative AI, the most famous example of which is
arguably ChatGPT, only began emerging in 2019. For better or worse, the steady
rise of generative AI has increasingly impacted the medical field. At this time, AI has begun to advance in a way that creates
potential liability…



Source link

Continue Reading

AI Research

Pharmaceutical Innovation Rises as Global Funding Surges and AI Reshapes Clinical Research – geneonline.com

Published

on



Pharmaceutical Innovation Rises as Global Funding Surges and AI Reshapes Clinical Research  geneonline.com



Source link

Continue Reading

AI Research

Radiomics-Based Artificial Intelligence and Machine Learning Approach for the Diagnosis and Prognosis of Idiopathic Pulmonary Fibrosis: A Systematic Review – Cureus

Published

on



Radiomics-Based Artificial Intelligence and Machine Learning Approach for the Diagnosis and Prognosis of Idiopathic Pulmonary Fibrosis: A Systematic Review  Cureus



Source link

Continue Reading

Trending