AI Research
Courts Rule Artificial Intelligence Training on Books Is Fair Use
Kadrey v. Meta: A complete victory for Meta built on evidentiary gaps
Thirteen prominent authors, including Sarah Silverman, Ta‑Nehisi Coates, Rachel Louise Snyder, Junot Díaz, and Pulitzer winner Andrew Sean Greer, sued Meta for copying their books from “shadow libraries” to train its Llama models. After discovery limited to the named plaintiffs’ works, the authors moved for partial summary judgment. Meta cross‑moved.
The court granted Meta’s motion and dismissed the copyright claim. Applying the § 107 fair use factors, the court reasoned that:
- Factor 1: Purpose and character. Using books to teach an LLM how to model linguistic relationships is “highly transformative,” because the model is deployed to draft emails, translate text, write code, and perform myriad tasks far removed from reading a book for entertainment or study. The commercial motive was acknowledged but did not override the degree of transformation.
- Factor 2: Nature of the work. The books are quintessentially creative, yet this factor “rarely controls” and carried little weight.
- Factor 3: Amount copied. Copying entire books was “reasonably necessary” to achieve the transformative purpose. Large, coherent blocks of high‑quality prose make LLMs better at handling long‑context prompts.
- Factor 4: Market effect. This point was decisive. The plaintiffs offered no admissible evidence that Llama outputs substitute for, or dilute sales of, their books. Tests showed Llama could not reproduce more than 50 tokens from any of the plaintiffs’ texts, even under adversarial prompting. Their second theory – that Meta’s unpaid use destroys a hypothetical “training‑data licensing” market – was rejected as circular because copyright law does not guarantee the right to monetize every conceivable downstream use.
The order binds only the 13 plaintiffs and leaves intact a separate count alleging distribution liability for Meta’s torrenting of the shadow‑library files. The court explicitly cautioned that “this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.”
Bartz v. Anthropic: A split decision that isolates piracy from transformative training
Five authors accused Anthropic of copying millions of books that were purchased, scanned, and pirated to train the Anthropic Claude models. With the court’s permission, Anthropic moved early for summary judgment on fair use. The court divided the challenged conduct into three distinct uses:
- Training copies (fair use). Feeding tokenized or compressed versions of books into model training is “spectacularly transformative.” Because the plaintiffs alleged no infringing outputs, the purpose, amount, and absence of market substitution compelled judgment for Anthropic.
- Print‑to‑digital conversions (fair use). Destroying lawfully purchased hard backs and retaining searchable PDFs inside an internal research library is a classic, non‑substitutive format shift. The court found no market harm and little relevance to the creative nature of the works.
- Pirated “forever” library (not fair use). In contrast, downloading roughly seven million e‑books from LibGen, Pirate Library Mirror, and other sites to build a permanent, general‑purpose corpus was held not to be fair use. The court stressed that the acquisition displaced ordinary sales and that “were the conduct to be condoned as a fair use . . . [a]s Anthropic itself suggested, ‘[t]hat would destroy the [entire] publishing market.’” Anthropic now faces a jury trial limited to damages for its pirated library copies.
Comparative implications
Taken together, the opinions draw three conclusions:
- LLM training is, no doubt, a transformative use. Both courts ruled that LLM training was highly transformative.
- Lawful sourcing matters. Bartz shows that pirating source material can override a fair use defense even when the end use is transformative.
- The fourth fair use factor is the new battleground. Meta prevailed because the authors offered no data tying Llama outputs to lost book sales or licensing value. But plaintiffs who marshal surveys, sales data, or dilution studies could tip the scales next time.
What comes next?
Appeals are virtually certain. The US Court of Appeals for the Ninth Circuit will be asked to decide whether every invisible back‑end copy must clear fair use independently (Bartz) and whether dilution without direct substitution can defeat fair use (Kadrey). Until clearer appellate law arrives, companies should assume that training on text obtained from dubious sources remains a high‑risk activity and that a lack of market‑harm evidence may be only a temporary shield.
Christopher Cyrus also contributed to this article.
AI Research
The new frontier of medical malpractice
Although the beginnings of modern artificial intelligence (AI) can be traced
as far back as 1956, modern generative AI, the most famous example of which is
arguably ChatGPT, only began emerging in 2019. For better or worse, the steady
rise of generative AI has increasingly impacted the medical field. At this time, AI has begun to advance in a way that creates
potential liability…
AI Research
Pharmaceutical Innovation Rises as Global Funding Surges and AI Reshapes Clinical Research – geneonline.com
AI Research
Radiomics-Based Artificial Intelligence and Machine Learning Approach for the Diagnosis and Prognosis of Idiopathic Pulmonary Fibrosis: A Systematic Review – Cureus
-
Funding & Business7 days ago
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries
-
Jobs & Careers7 days ago
Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding
-
Mergers & Acquisitions7 days ago
Donald Trump suggests US government review subsidies to Elon Musk’s companies
-
Funding & Business6 days ago
Rethinking Venture Capital’s Talent Pipeline
-
Jobs & Careers6 days ago
Why Agentic AI Isn’t Pure Hype (And What Skeptics Aren’t Seeing Yet)
-
Funding & Business4 days ago
Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%
-
Jobs & Careers6 days ago
Telangana Launches TGDeX—India’s First State‑Led AI Public Infrastructure
-
Funding & Business1 week ago
From chatbots to collaborators: How AI agents are reshaping enterprise work
-
Jobs & Careers6 days ago
Astrophel Aerospace Raises ₹6.84 Crore to Build Reusable Launch Vehicle
-
Funding & Business6 days ago
Europe’s Most Ambitious Startups Aren’t Becoming Global; They’re Starting That Way