Copyright, Fair Use, and the Fight Over AI Training Data

Ron Green
,
Co-founder, Chief Technology Officer

One of the most important questions in AI right now isn’t about compute or alignment. It’s about copyright.

Should training AI models on copyrighted content be considered fair use?

Courts are starting to weigh in. And while the rulings so far lean toward fair use, they’re also exposing how outdated the legal frameworks really are for this new era of machine learning.

In two recent cases, federal judges ruled that scanning legally acquired books to train large models was “transformative” enough to count as fair use. But they drew the line at pirated content. And they left the door open for future challenges, especially around market harm.

That last piece is key.

Fair use hinges partly on whether the AI harms the market for the original work. But that’s hard to prove. Would a person have paid to read a book if the AI hadn’t been trained on it? Did the training actually reduce demand?

Think of it like this: training an AI on copyrighted content is less like copying a painting and more like learning to paint by visiting a museum. The model doesn’t reproduce the originals, it absorbs patterns, styles, and techniques to generate something new. Trying to trace back its output to a single source is like asking which painting at the Louvre taught someone to paint a tree.

In a world where models are trained on billions of tokens scraped from the public internet, this analogy isn’t perfect, but it captures the scale of the dilemma. Training data is fuel. The question is who owns the gas station. Billions of dollars and entire creative industries hang in the balance.

These early rulings suggest that courts may favor innovation, but the fight isn’t over. As models become more powerful and outputs blur the line between inspiration and imitation, the legal gray area is only going to get messier.

One thing is clear: the legal battles over training data are just getting started. And how they’re resolved will shape not just the future of generative AI, but the creative economy itself.

If you’re building in this space, it’s time to pay attention.

We unpack this topic in the latest episode of Hidden Layers—tune in if you're curious about where this fight is headed.

Related resources

No items found.