Connect with us

AI Insights

Artificial Intelligence Training on Books Is Fair Use—Pira

Published

on


The US District Court for the Northern District of California granted summary judgment in favor of an artificial intelligence (AI) company, finding that its use of lawfully acquired copyrighted materials for training and its digitization of acquired print works fell within the bounds of fair use. However, the district court explicitly rejected the AI company’s attempt to invoke fair use as a defense to rely on pirated copies of copyrighted works as lawful training data. Andrea Bartz, et al. v. Anthropic PBC, Case No. 24-CV-05417-WHA (N.D. Cal. June 23, 2025) (Alsup, J.)

Anthropic, an AI company, acquired more than seven million copyrighted books without authorization by downloading them from pirate websites. It also lawfully purchased print books, removed their bindings, scanned each page, and stored them in digitized, searchable files. The goal was twofold:

  • To create a central digital library intended, in Anthropic’s words, to contain “all the books in the world” and to be preserved indefinitely.
  • To use this library to train the large language models (LLMs) that power Anthropic’s AI assistant, Claude.

Each work selected for training the LLM was copied through four main stages:

  • Each selected book was copied from the library to create a working copy for training.
  • Each book was “cleaned” by removing low-value or repetitive content (e.g., footers).
  • Cleaned books were converted into “tokenized” versions by being simplified and split into short character sequences, then translated into numerical tokens using Anthropic’s custom dictionary. These tokens were repeatedly used in training, allowing the model to discover statistical relationships across massive text data.
  • Each fully trained LLM itself retained “compressed” copies of the books.

Once the LLM was trained, it did not output any of the books through Claude to the public. The company placed particular value on books with well-curated facts, structured analyses, and compelling narratives (i.e., works that reflected well-written creative expressions) because Claude’s users expected clear, accurate, and well-written responses to their questions.

Andrea Bartz, along with two other authors whose books were copied from pirated and purchased sources and used to train Claude, sued Anthropic for copyright infringement. In response, Anthropic filed an early motion for summary judgment on fair use only under Section 107 of the Copyright Act.

To assess the applicability of the fair use defense, the court separated and analyzed Anthropic’s actions across three distinct categories of use.

Transformative training (fair use)

The authors challenged only the inputs used to train the LLMs, not their outputs. The district court found that Anthropic’s use of copyrighted books to train its LLMs was a transformative use, comparable to how humans read and learn from texts and produce new, original writing. While the authors claimed that the LLMs memorized their creative expression, there was no evidence that Claude released infringing material to the public. The court concluded that using the works as training inputs – not for direct replication, but to enable the generation of new content – favored a finding of fair use.

Format-shifting copies (fair use)

The authors challenged Anthropic’s conversion of the copyrighted works from print to digital format, although they did not allege that Anthropic distributed any of the digital copies outside the company. The district court found that Anthropic had lawfully purchased the print editions and acquired the right to retain and use them for all ordinary purposes. Each print copy was digitized to save space and enable search functionality, and the original was destroyed after conversion. The court concluded that the print-to-digital format change was transformative under fair use.

Liability for piracy (not fair use)

The district court agreed with the authors that Anthropic’s downloading and retention of more than seven million pirated books – without payment – was not a fair use, regardless of whether the books were ultimately used to train its AI models. Even after Anthropic decided not to train its LLMs on those pirated copies, it kept them as part of a central research library, a use the court found inherently infringing and non-transformative. The court rejected Anthropic’s argument that its long-term goal of a transformative use (training LLMs) could retroactively justify the initial infringement, emphasizing that each act of copying must be judged by its own objective use. The court explained that “such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.”

Anthropic now faces a jury trial limited to damages for its pirated copies.

Practice note: This is the first federal court decision analyzing the defense of fair use of copyrighted material to train generative AI. Two days after this decision issued, another Northern District of California judge ruled in Kadrey et al. v. Meta Platforms Inc. et al., Case No. 3:23-cv-03417, and concluded that the AI technology at issue in his case was transformative. However, the basis for his ruling in favor of Meta on the question of fair use was not transformation, but the plaintiffs’ failure “to present meaningful evidence that Meta’s use of their works to create [a generative AI engine] impacted the market” for the books.



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Insights

Apple's top executive in charge of artificial intelligence models, Ruoming Pang, is leaving for Meta – Bloomberg News – MarketScreener

Published

on



Apple’s top executive in charge of artificial intelligence models, Ruoming Pang, is leaving for Meta – Bloomberg News  MarketScreener



Source link

Continue Reading

AI Insights

Intro robotics students build AI-powered robot dogs from scratch

Published

on


Equipped with a starter robot hardware kit and cutting-edge lessons in artificial intelligence, students in CS 123: A Hands-On Introduction to Building AI-Enabled Robots are mastering the full spectrum of robotics – from motor control to machine learning. Now in its third year, the course has students build and enhance an adorable quadruped robot, Pupper, programming it to walk, navigate, respond to human commands, and perform a specialized task that they showcase in their final presentations.

The course, which evolved from an independent study project led by Stanford’s robotics club, is now taught by Karen Liu, professor of computer science in the School of Engineering, in addition to Jie Tan from Google DeepMind and Stuart Bowers from Apple and Hands-On Robotics. Throughout the 10-week course, students delve into core robotics concepts, such as movement and motor control, while connecting them to advanced AI topics.

“We believe that the best way to help and inspire students to become robotics experts is to have them build a robot from scratch,” Liu said. “That’s why we use this specific quadruped design. It’s the perfect introductory platform for beginners to dive into robotics, yet powerful enough to support the development of cutting-edge AI algorithms.”

What makes the course especially approachable is its low barrier to entry – students need only basic programming skills to get started. From there, the students build up the knowledge and confidence to tackle complex robotics and AI challenges.

Robot creation goes mainstream

Pupper evolved from Doggo, built by the Stanford Student Robotics club to offer people a way to create and design a four-legged robot on a budget. When the team saw the cute quadruped’s potential to make robotics both approachable and fun, they pitched the idea to Bowers, hoping to turn their passion project into a hands-on course for future roboticists.

“We wanted students who were still early enough in their education to explore and experience what we felt like the future of AI robotics was going to be,” Bowers said.

This current version of Pupper is more powerful and refined than its predecessors. It’s also irresistibly adorable and easier than ever for students to build and interact with.

“We’ve come a long way in making the hardware better and more capable,” said Ankush Kundan Dhawan, one of the first students to take the Pupper course in the fall of 2021 before becoming its head teaching assistant. “What really stuck with me was the passion that instructors had to help students get hands-on with real robots. That kind of dedication is very powerful.”

Code come to life

Building a Pupper from a starter hardware kit blends different types of engineering, including electrical work, hardware construction, coding, and machine learning. Some students even produced custom parts for their final Pupper projects. The course pairs weekly lectures with hands-on labs. Lab titles like Wiggle Your Big Toe and Do What I Say keep things playful while building real skills.

CS 123 students ready to show off their Pupper’s tricks. | Harry Gregory

Over the initial five weeks, students are taught the basics of robotics, including how motors work and how robots can move. In the next phase of the course, students add a layer of sophistication with AI. Using neural networks to improve how the robot walks, sees, and responds to the environment, they get a glimpse of state-of-the-art robotics in action. Many students also use AI in other ways for their final projects.

“We want them to actually train a neural network and control it,” Bowers said. “We want to see this code come to life.”

By the end of the quarter this spring, students were ready for their capstone project, called the “Dog and Pony Show,” where guests from NVIDIA and Google were present. Six teams had Pupper perform creative tasks – including navigating a maze and fighting a (pretend) fire with a water pick – surrounded by the best minds in the industry.

“At this point, students know all the essential foundations – locomotion, computer vision, language – and they can start combining them and developing state-of-the-art physical intelligence on Pupper,” Liu said.

“This course gives them an overview of all the key pieces,” said Tan. “By the end of the quarter, the Pupper that each student team builds and programs from scratch mirrors the technology used by cutting-edge research labs and industry teams today.”

All ready for the robotics boom

The instructors believe the field of AI robotics is still gaining momentum, and they’ve made sure the course stays current by integrating new lessons and technology advances nearly every quarter.

A water jet is mounted on this "firefighter" Pupper

This Pupper was mounted with a small water jet to put out a pretend fire. | Harry Gregory

Students have responded to the course with resounding enthusiasm and the instructors expect interest in robotics – at Stanford and in general – will continue to grow. They hope to be able to expand the course, and that the community they’ve fostered through CS 123 can contribute to this engaging and important discipline.

“The hope is that many CS 123 students will be inspired to become future innovators and leaders in this exciting, ever-changing field,” said Tan.

“We strongly believe that now is the time to make the integration of AI and robotics accessible to more students,” Bowers said. “And that effort starts here at Stanford and we hope to see it grow beyond campus, too.”



Source link

Continue Reading

AI Insights

Why Infuse Asset Management’s Q2 2025 Letter Signals a Shift to Artificial Intelligence and Cybersecurity Plays

Published

on


The rapid evolution of artificial intelligence (AI) and the escalating complexity of cybersecurity threats have positioned these sectors as the next frontier of investment opportunity. Infuse Asset Management’s Q2 2025 letter underscores this shift, emphasizing AI’s transformative potential and the urgent need for robust cybersecurity infrastructure to mitigate risks. Below, we dissect the macroeconomic forces, sector-specific tailwinds, and portfolio reallocation strategies investors should consider in this new paradigm.

The AI Uprising: Macro Drivers of a Paradigm Shift

The AI revolution is accelerating at a pace that dwarfs historical technological booms. Take ChatGPT, which reached 800 million weekly active users by April 2025—a milestone achieved in just two years. This breakneck adoption is straining existing cybersecurity frameworks, creating a critical gap between innovation and defense.

Meanwhile, the U.S.-China AI rivalry is fueling a global arms race. China’s industrial robot installations surged from 50,000 in 2014 to 290,000 in 2023, outpacing U.S. adoption. This competition isn’t just about economic dominance—it’s a geopolitical chess match where data sovereignty, espionage, and AI-driven cyberattacks now loom large. The concept of “Mutually Assured AI Malfunction (MAIM)” highlights how even a single vulnerability could destabilize critical systems, much like nuclear deterrence but with far less predictability.

Cybersecurity: The New Infrastructure for an AI World

As AI systems expand into physical domains—think autonomous taxis or industrial robots—so do their vulnerabilities. In San Francisco, autonomous taxi providers now command 27% market share, yet their software is a prime target for cyberattacks. The decline in AI inference costs (outpacing historical declines in electricity and memory) has made it cheaper to deploy AI, but it also lowers the barrier for malicious actors to weaponize it.


Tech giants are pouring capital into AI infrastructure—NVIDIA and Microsoft alone increased CapEx from $33 billion to $212 billion between 2014 and 2024. This influx creates a vast, interconnected attack surface. Investors should prioritize cybersecurity firms that specialize in quantum-resistant encryption, AI-driven threat detection, and real-time infrastructure protection.

The Human Element: Skills Gaps and Strategic Shifts

The demand for AI expertise is soaring, but the workforce is struggling to keep pace. U.S. AI-related IT job postings have surged 448% since 2018, while non-AI IT roles have declined by 9%. This bifurcation signals two realities:
1. Cybersecurity skills are now mission-critical for safeguarding AI systems.
2. Ethical AI development and governance are emerging as compliance priorities, particularly in regulated industries.

The data will likely show a stark divergence, reinforcing the need for investors to back training platforms and cybersecurity firms bridging this skills gap.

Portfolio Reallocation: Where to Deploy Capital

Infuse’s insights suggest three actionable strategies:

  1. Core Holdings in Cybersecurity Leaders:
    Target firms like CrowdStrike (CRWD) and Palo Alto Networks (PANW), which excel in AI-powered threat detection and endpoint security.

  2. Geopolitical Plays:
    Invest in companies addressing data sovereignty and cross-border compliance, such as Palantir (PLTR) or Cloudflare (NET), which offer hybrid cloud solutions.

  3. Emerging Sectors:
    Look to quantum computing security (e.g., Rigetti Computing (RGTI)) and AI governance platforms like DataRobot (NASDAQ: MGNI), which help enterprises audit and validate AI models.

The Bottom Line: AI’s Growth Requires a Security Foundation

The “productivity paradox” of AI—where speculative valuations outstrip tangible ROI—is real. Yet, cybersecurity is one area where returns are measurable: breaches cost companies millions, and defenses reduce risk. Investors should treat cybersecurity as the bedrock of their AI investments.

As Infuse’s letter implies, the next decade will belong to those who balance AI’s promise with ironclad security. Position portfolios accordingly.

JR Research



Source link

Continue Reading

Trending