AI Insights

Artificial Intelligence Training on Books Is Fair Use—Pira

Published

2 months ago

July 2, 2025

The US District Court for the Northern District of California granted summary judgment in favor of an artificial intelligence (AI) company, finding that its use of lawfully acquired copyrighted materials for training and its digitization of acquired print works fell within the bounds of fair use. However, the district court explicitly rejected the AI company’s attempt to invoke fair use as a defense to rely on pirated copies of copyrighted works as lawful training data. Andrea Bartz, et al. v. Anthropic PBC, Case No. 24-CV-05417-WHA (N.D. Cal. June 23, 2025) (Alsup, J.)

Anthropic, an AI company, acquired more than seven million copyrighted books without authorization by downloading them from pirate websites. It also lawfully purchased print books, removed their bindings, scanned each page, and stored them in digitized, searchable files. The goal was twofold:

To create a central digital library intended, in Anthropic’s words, to contain “all the books in the world” and to be preserved indefinitely.
To use this library to train the large language models (LLMs) that power Anthropic’s AI assistant, Claude.

Each work selected for training the LLM was copied through four main stages:

Each selected book was copied from the library to create a working copy for training.
Each book was “cleaned” by removing low-value or repetitive content (e.g., footers).
Cleaned books were converted into “tokenized” versions by being simplified and split into short character sequences, then translated into numerical tokens using Anthropic’s custom dictionary. These tokens were repeatedly used in training, allowing the model to discover statistical relationships across massive text data.
Each fully trained LLM itself retained “compressed” copies of the books.

Once the LLM was trained, it did not output any of the books through Claude to the public. The company placed particular value on books with well-curated facts, structured analyses, and compelling narratives (i.e., works that reflected well-written creative expressions) because Claude’s users expected clear, accurate, and well-written responses to their questions.

Andrea Bartz, along with two other authors whose books were copied from pirated and purchased sources and used to train Claude, sued Anthropic for copyright infringement. In response, Anthropic filed an early motion for summary judgment on fair use only under Section 107 of the Copyright Act.

To assess the applicability of the fair use defense, the court separated and analyzed Anthropic’s actions across three distinct categories of use.

Transformative training (fair use)

The authors challenged only the inputs used to train the LLMs, not their outputs. The district court found that Anthropic’s use of copyrighted books to train its LLMs was a transformative use, comparable to how humans read and learn from texts and produce new, original writing. While the authors claimed that the LLMs memorized their creative expression, there was no evidence that Claude released infringing material to the public. The court concluded that using the works as training inputs – not for direct replication, but to enable the generation of new content – favored a finding of fair use.

Format-shifting copies (fair use)

The authors challenged Anthropic’s conversion of the copyrighted works from print to digital format, although they did not allege that Anthropic distributed any of the digital copies outside the company. The district court found that Anthropic had lawfully purchased the print editions and acquired the right to retain and use them for all ordinary purposes. Each print copy was digitized to save space and enable search functionality, and the original was destroyed after conversion. The court concluded that the print-to-digital format change was transformative under fair use.

Liability for piracy (not fair use)

The district court agreed with the authors that Anthropic’s downloading and retention of more than seven million pirated books – without payment – was not a fair use, regardless of whether the books were ultimately used to train its AI models. Even after Anthropic decided not to train its LLMs on those pirated copies, it kept them as part of a central research library, a use the court found inherently infringing and non-transformative. The court rejected Anthropic’s argument that its long-term goal of a transformative use (training LLMs) could retroactively justify the initial infringement, emphasizing that each act of copying must be judged by its own objective use. The court explained that “such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.”

Anthropic now faces a jury trial limited to damages for its pirated copies.

Practice note: This is the first federal court decision analyzing the defense of fair use of copyrighted material to train generative AI. Two days after this decision issued, another Northern District of California judge ruled in Kadrey et al. v. Meta Platforms Inc. et al., Case No. 3:23-cv-03417, and concluded that the AI technology at issue in his case was transformative. However, the basis for his ruling in favor of Meta on the question of fair use was not transformation, but the plaintiffs’ failure “to present meaningful evidence that Meta’s use of their works to create [a generative AI engine] impacted the market” for the books.

Source link

AI Insights

Oracle’s AI strategy to take on Epic – statnews.com

Published

21 minutes ago

September 10, 2025

The Editors

Oracle’s AI strategy to take on Epic statnews.com

Source link

AI Insights

Building industrial AI from the inside out for a stronger digital core

Published

1 hour ago

September 10, 2025

The Editors

A manufacturer was running an AI training workload on a cobbled together system of GPUs, storage, and switching infrastructure, believing it had all the necessary tech to achieve its goals. But the company had put little thought into how the components actually worked together.

Problems surfaced quickly. Training cycles dragged on for days instead of hours. Expensive hardware sat idle. And engineering teams began to wonder whether their AI investment would ever pay off.

This experience isn’t unique. As AI becomes a critical element of industrial operations worldwide, many organizations are discovering a counterintuitive truth: the biggest breakthroughs come not from piling on more GPUs or larger models, but from carefully engineering the entire infrastructure to work as a single, integrated system.

Engineering for outcomes

What became of that cobbled-together system? When it was properly engineered to balance compute, networking, and storage, the improvement was quick and dramatic, explains Jason Hardy, CTO of AI for Hitachi Vantara: a 20x boost in output and a matching reduction in “wall clock time,” the actual time it takes to complete AI training cycles.

“The infrastructure must be engineered so you understand exactly what each component delivers,” Hardy explains. “You want to know how the GPU drives specific outcomes, how that impacts the data requirements, and demands on throughput and bandwidth.”

Getting systems to run that smoothly means confronting a challenge most organizations would rather avoid: aging infrastructure.

Hardy points to a semiconductor manufacturer whose systems performed fine—until AI entered the picture. “As soon as they threw AI on top of it, just reading the data out of those systems brought everything to a halt,” he says.

This scenario reflects a widespread industrial reality. Manufacturing environments often rely on systems that have been running reliably for years, even decades. “The only places I can think of where Windows 95 still exists and is used daily are in manufacturing,” Hardy says. “These lines have been operational for decades.”

That longevity now collides with new demands: industrial AI requires exponentially more data throughput than traditional enterprise applications, and legacy systems simply can’t keep up. The challenge creates a fundamental mismatch between aspirations and capabilities.

“We have this transformational outcome we want to pursue,” Hardy explains. “We have these laggard technologies that were good enough before, but now we need a little bit more from them.”

From real-time requirements to sovereign AI

In industrial AI, performance demands often make enterprise workloads look leisurely. Hardy describes a visual inspection system for a manufacturer in Asia that relied entirely on real-time image analysis for quality and cost control. “They wanted AI for quality control and to improve yield, while also controlling costs,” he says.

The AI had to process high-resolution images at production speed—no delays, no cloud roundtrips. The system doesn’t just flag defects but traces them to the upstream machine causing the problem, enabling immediate repairs. It can also salvage partially damaged products by dynamically rerouting them for alternate uses, reducing waste while maintaining yield.

All of this happens in real-time while collecting telemetry to continuously retrain the models, turning what had been a waste problem into an optimization advantage that improves over time.

Using the cloud exclusively introduces delays that make near-real-time processing impossible, Hardy says. The latency from sending data to remote servers and waiting for results back can’t meet manufacturing’s millisecond requirements.

Hardy advocates a hybrid approach: design infrastructure with an on-premises mindset for mission-critical, real-time tasks, and leverage the cloud for burst capacity, development, and non-latency-sensitive cloud-friendly workloads. The approach also serves the rising need for sovereign AI solutions. Sovereign AI ensures that mission-critical AI systems and data remain within national borders for regulatory and cultural compliance. As Hardy says, countries like Saudi Arabia are investing heavily in bringing AI assets in-country to maintain sovereignty, while India is building language- and culture-specific models to accurately reflect its thousands of spoken languages and microcultures.

AI infrastructure is more than muscle

Such high-level performance requires more than just fast hardware. It calls for an engineering mindset that starts with the desired outcome and data sources. As Hardy puts it, “You should step back and not just say, ‘You need a million dollars’ worth of GPUs.’” He notes that sometimes, “85% readiness is sufficient,” emphasizing practicality over perfection.

From there, the emphasis shifts to disciplined, cost-conscious design. “Think about it this way,” Hardy says. “If an AI project were coming out of your own budget, how much would you be willing to spend to solve the problem? Then engineer based on that realistic assessment.”

This mindset forces discipline and optimization. The approach works because it considers both the industrial side (operational requirements) and the IT side (technical optimization)—a combination he says is rare.

Hardy’s observations align with recent academic research on hybrid computing architectures in industrial settings. A 2024 study in the Journal of Technology, Informatics and Engineering¹ found that engineered CPU/GPU systems achieved 88.3% accuracy while using less energy than GPU-only setups, confirming the benefits of an engineering approach.

The financial impact of getting infrastructure wrong can be substantial. Hardy notes that organizations have traditionally overspend on GPU resources that sit idle much of the time, while missing the performance gains that come from proper system engineering. “The traditional approach of buying a pool of GPU resources brings a lot of waste,” Hardy says. “The infrastructure-first approach eliminates this inefficiency while delivering superior results.”

Avoiding mission-critical mistakes

In industrial AI, mistakes can be catastrophic—faulty rail switches, conveyors without emergency shutoffs, or failing equipment can injure people or stop production. “We have an ethical bias to ensure everything we do in the industrial complex is 100% accurate—every decision has critical stakes,” Hardy says.

This commitment shapes Hitachi’s approach: redundant systems, fail-safes, and cautious rollouts ensure reliability takes precedence over speed. “It does not move at the speed of light for a reason,” Hardy explains.

The stakes help explain why Hardy takes a pragmatic view of AI project success rates. “Though 80-90% of AI projects never go to production, the ones that do can justify the entire effort,” he says. “Not doing anything is not an option. We have to move forward and innovate.”

For more on engineering systems for balanced and optimum AI performance, see AI Analytics Platform | Hitachi IQ

Jason Hardy is CTO of AI for Hitachi Vantara, a company specializing in data-driven AI solutions. The company’s Hitachi iQ platform, a scalable and high-performance turn-key solution, plays a critical role in enabling infrastructure that balances compute, networking, and storage to meet the demanding needs of enterprise and industrial AI.

¹Optimizing AI Performance in Industry: A Hybrid Computing Architecture Approach Based on Big Data | Journal of Technology Informatics and Engineering

Source link

Continue Reading

AI Insights

A Platform Leader’s Path to Sustained Dominance

Published
2 hours ago
on
September 10, 2025

By
The Editors

This article first appeared on GuruFocus.

Salesforce (NYSE: CRM) offers a compelling long-term opportunity due to its leadership in the customer relationship management (NYSE:CRM) market, expanding AI integration, and growing addressable market. because of its continuous leading position in the customer relationship management (NYSE:CRM) market, the fast-growing adoption of artificial intelligence, and rapidly increasing the total addressable market. The business model that the company provides allows subscriptions and grants profits that are predictable, moreover, the platform-based approach increases the switching costs and provides opportunities to expand within the existing client bases. Thus, theThe business model enables high visibility into recurring revenue and long-term client retention.

Standing at the forefront of the global CRM market, Salesforce has captured nearly a quarter (23%) of the market share, leaving behind formidable adversaries like Microsoft, Oracle, and SAP. The position of being number one provides Salesforce with a multilayer competitive fortifications; thus, it builds a strong economic moat.

Network Effects and Ecosystem Dominance: There are over 4,000 applications on the AppExchange belonging to the Salesforce ecosystem, reinforcing its self-expanding cycle where developers attract customers, and vice versa. This network effect, which grows stronger with the addition of new applications thus the ecosystem expands, creates a barrier to entry that competitors find it almost impossible to duplicate. Independent software vendors (ISVs) devote a large amount of time and energy in creating applications that work with Salesforce, which in turn, makes the customers think that if they switch to another platform they would be missing out on the benefits of the whole partner network.

Data Network Effects: The greater the number of customers availing of Salesforce services, the larger is the amount of the data which is being accumulated by the platform on customer interactions across different sectors and applications. This data facilitates the enhancement of AI models, predictive analytics, and benchmarking capabilities; thus, the platform continues to generate value that compounds over the years and is not easily reproduced by new entrants.

Salesforce’s artificial intelligence strategy is a key component not just because it adds features, Salealos because sforce’s artificial intelligence strategy enhances its platform differentiation and supports operational leverage.. The company introduces the next generation of AI technology and improves the algorithmic base inspiring the organizations to work different with the which they produce and hold about their own customer data.

Einstein Platform Foundation: By means of the Salesforce Einstein accouting, the platform processes 200 billion predictions a day, thus, it is leveraging the collective data of hundreds of thousands of customers for the purpose of the constant improvement of AI models. This magnitude of data processing and machine learning approach creates predictive features that do not have any equivalent among smaller competitors who do not have a similar scale of data or equal access to it.

Salesforce Empowers Generative AI Integration: By launching Einstein GPT and merging it with large language models Salesforce enters a new arena of generative AI market and value retrieval. Unlike isolated AI appliances, Salesforce’s AI is integrated with the knowledge of customer data, ongoing work processes, and the history of past interactions which makes it more precise and applicable for straight-tied AI-generated analytical views and suggestions.

Industry-Specific AI Models: In order to increase the vertical-specific AI functionality, Salesforce is working on certain technologies for the fields of healthcare, financial services, and retail. These individual models not only take into account the industry standard rules and jargon but also best practices, thereby creating additional switching costs and competitive differentiations that generic AI platforms cannot compete with easily.

Recurring Revenue Model: Salesforce’s subscription-based model provides an excellent forecast of the company’s financial outcome and adds up to the long-term investors’ value in a compounding way. The company with more than 90% of the revenue recurrence and contracts being mainly over one year gives the insiders a rare opportunity to see the metric trends moving forward. The remaining performance obligation (RPO) backlog of the company which is over $25 billion is the contracted future income and it brings both risk mitigating and growth visibility effects.

Salesforce evolved from single-app CRM to a multi-solutions customer management platform that now has several advantages in running the business and growing.

Multi-Cloud Synergies: The connected system of Sales Cloud, Service Cloud, Marketing Cloud, Commerce Cloud, and Analytics Cloud is a source of valuable cross-selling potentials and a customer switching cost increase. Enterprises using multiple clouds of Salesforce enjoy unified customer data, a consistent user experience, and an integrated workflow that becomes hard to build in the same way if the applications were used across different vendors.

Industry Verticalization: Salesforce has produced vertical-specific solutions for healthcare, financial services, retail, manufacturing, and other segments. These industry clouds merge the core platform with the prebuilt processes, compliance features, and the specific data models of the industry. This verticalization strategy will sidestep a generic CRM program by offering more suitable tools, which leads to further competitive moats.

Salesforce’s major acquisitions: MuleSoft (2018), Tableau (2019), and Slack (2021), have all been integrated into its broader platform strategy, not as standalone tools but as functional extensions of the core CRM architecture. MuleSoft’s API capabilities enable connectivity across enterprise systems, making Salesforce more interoperable within legacy environments. Tableau enhances data visibility across Salesforce products, giving users embedded analytics tailored to operational workflows. Slack has become central to Salesforce’s vision of asynchronous collaboration, now embedded directly into Sales and Service Cloud interfaces.

Each acquisition has followed a clear integration path: building native connectors, embedding dashboards or features directly into Salesforce interfaces, and enabling data-sharing across clouds. This approach has allowed Salesforce to expand the surface area of customer engagement without disrupting platform cohesion.

EBITDA Productivity Trends

Between FY2022 and FY2024, Salesforce significantly enhanced its operational efficiency. According to its 10-K filings, EBITDA per employee grew from approximately $63,000 in FY2022 to over $145,000 by FY2024 , more than doubling in just two years.

This improvement was driven by a two-fold strategy:

First, Salesforce undertook a substantial restructuring in 2023, reducing its global workforce by about 10%, or over 7,000 employees.

Second, the company implemented tighter cost controls, improved operating discipline, and began integrating AI-driven productivity tools internally, helping expand operating margins from ~3% in FY2022 to ~17% in FY2024.

The net result was a leaner, more focused organization generating more value per employee a trend that aligns with broader tech-sector shifts toward profitability over pure growth.

Salesforce: A Platform Leader’s Path to Sustained Dominance

Salesforce’s Change of Operations Dramatically

Salesforce’s expansion from about $60K EBITDA per employee in 2022 to $149K in 2025 was a huge leap with a 150% increase in operational efficiency. The report suggests that the company has undergone a fundamental restructuring of its cost base and has gained important operating leverage, likely through the use of AI for automation, process optimization, and adopting more disciplined hiring practices after the 2022 tech downturn.

Salesforce’s trajectory of improvement implies that the management’s insistence on profitability is yielding positive results, which lends credence to the investment thesis. On the other hand, Oracle’s consistent efficiency combined with the lower valuation multiple entails a strong value proposition. SAP’s lambing statistics underscore the execution risks involved in large-scale business model transitions which render it the riskiest despite its market position.

AI Integration and Competitive Positioning

Salesforce is experiencing other execution risks in its AI strategy that could generally change its competitive position and attractiveness to investors.

Technical Integration Complexity

Efficiently integrating AI across Salesforce’s really wide platform ecosystem takes the complete integration with existing workflows, data models, and user interfaces to be totally free of any bugs. An imperfect implementation of AI could site planning disruption since customers would need to operationally integrate core CRM components which they cannot omit, and this could further lead to system instability or user resistance. The shortfall of technical supports lies in maintaining platform reliability while allowing multiple clouds to introduce advanced AI capabilities is a difficult execution task.

Competitive Vulnerability

Microsoft’s superb AI know-how via the OpenAI partnership and the Azure infrastructure presents a strong competitive menace to Salesforce’s AI hopes. When Salesforce’s AI capabilities are slower than Microsoft’s Copilot integration, or else when they do not introduce any substantial improvement in productivity, enterprise clients may change to the Microsoft ecosystem for cohesive AI-revealed productivity implements. This risk is further increased due to the fact that Microsoft has already established Office 365 customer relationships and its holistic approach to the market.

ATOMVEST has a comparatively huge 48.91% portfolio concentration in a likely single position of $23.4 million. This is a very acute concentration risk that contradicts essential portfolio management guidelines. The concentration has actually increased from 40.55% to 48.91% of the portfolio, which implies either poor rebalancing discipline or severe underperformance in other investments.

VALUEACT on the other hand, follows far better diversification strategies with a 16.98% allocation, down from 22.08% but poses different issues regarding position management. The primary sizeable holding of 2.9 million shares worth $778 million denotes deep belief and their 0.30% ownership stake would make them the notable influence as an activist investor. Nevertheless, the company hasn’t made any position changes while the stock has seemingly gone down considerably in value. The change from 22.08% to 16.98% seems to be the result of price drop rather than active selling. This is odd for an activist investor who would be expected to influence outcomes.

Salesforce is a high-quality growth stock suitable for both investors who are into technology and energy and those who seek to gain from digital transformation. The firm’s dominance in the market, its stable recurring income, and the AI-based innovative developments all act to create a strong foundation for value creation in the long term. The call for investment stays valid for people who believe in the ongoing digitization of commercial processes and Salesforce’s capability to run its multi-cloud platform with efficiency thus keeping its leadership role in the evolving CRM space.

Source link

Continue Reading

Trending

Business2 weeks ago

The Guardian view on Trump and the Fed: independence is no substitute for accountability | Editorial

Tools & Platforms4 weeks ago

Building Trust in Military AI Starts with Opening the Black Box – War on the Rocks

Ethics & Policy1 month ago

SDAIA Supports Saudi Arabia’s Leadership in Shaping Global AI Ethics, Policy, and Research – وكالة الأنباء السعودية

Events & Conferences4 months ago

Journey to 1000 models: Scaling Instagram’s recommendation system

Jobs & Careers2 months ago

Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding

Education2 months ago

VEX Robotics launches AI-powered classroom robotics system

Podcasts & Talks2 months ago

Happy 4th of July! 🎆 Made with Veo 3 in Gemini

Education2 months ago

Macron says UK and France have duty to tackle illegal migration ‘with humanity, solidarity and firmness’ – UK politics live | Politics

Funding & Business2 months ago

Kayak and Expedia race to build AI travel agents that turn social posts into itineraries

Podcasts & Talks2 months ago

OpenAI 🤝 @teamganassi

aistoriz.com

Artificial Intelligence Training on Books Is Fair Use—Pira

AI Insights

Artificial Intelligence Training on Books Is Fair Use—Pira

Transformative training (fair use)

Format-shifting copies (fair use)

Liability for piracy (not fair use)

Leave a Reply
Cancel reply

Leave a Reply

AI Insights

Oracle’s AI strategy to take on Epic – statnews.com

AI Insights

Building industrial AI from the inside out for a stronger digital core

Engineering for outcomes

From real-time requirements to sovereign AI

AI infrastructure is more than muscle

Avoiding mission-critical mistakes

AI Insights

A Platform Leader’s Path to Sustained Dominance

Trending

aistoriz.com

Artificial Intelligence Training on Books Is Fair Use—Pira

Transformative training (fair use)

Format-shifting copies (fair use)

Liability for piracy (not fair use)

You may like

Leave a Reply Cancel reply

Leave a Reply

AI Insights

Oracle’s AI strategy to take on Epic – statnews.com

AI Insights

Building industrial AI from the inside out for a stronger digital core

Engineering for outcomes

From real-time requirements to sovereign AI

AI infrastructure is more than muscle

Avoiding mission-critical mistakes

AI Insights

A Platform Leader’s Path to Sustained Dominance

Trending

Leave a Reply
Cancel reply