Several authors have filed suit against Meta (Zuckerberg), Bloomberg, and other tech corporations for violating the copyright on their books. Alex Reisner has written three articles in The Atlantic about how the developers of Artificial Intelligence (AI) have used 183,000 books to train AI how to write.

Two of those 183,000 books are mine: The Death and Life of the Great American School System: How Testing and Choice Are Undermining Education and Reign of Error: The Hoax of the Privatization Movement and the Danger to America’s Public Schools. As an author, I am outraged that huge tech corporations used my books as training fodder for their profiteering.

Reissner writes:

This summer, I acquired a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. I wrote in The Atlantic about how the data set, known as “Books3,” was based on a collection of pirated ebooks, most of them published in the past 20 years. Since then, I’ve done a deep analysis of what’s actually in the data set, which is now at the center of several lawsuits brought against Meta by writers such as Sarah Silverman, Michael Chabon, and Paul Tremblay, who claim that its use in training generative AI amounts to copyright infringement.

Since my article appeared, I’ve heard from several authors wanting to know if their work is in Books3. In almost all cases, the answer has been yes. These authors spent years thinking, researching, imagining, and writing, and had no idea that their books were being used to train machines that could one day replace them. Meanwhile, the people building and training these machines stand to profit enormously.

Reached for comment, a spokesperson for Meta did not directly answer questions about the use of pirated books to train LLaMA, the company’s generative-AI product. Instead, she pointed me to a court filing from last weekrelated to the Silverman lawsuit, in which lawyers for Meta argue that the case should be dismissed in part because neither the LLaMA model nor its outputs are “substantially similar” to the authors’ books.

It may be beyond the scope of copyright law to address the harms being done to authors by generative AI, and the point remains that AI-training practices are secretive and fundamentally nonconsensual. Very few people understand exactly how these programs are developed, even as such initiatives threaten to upend the world as we know it. Books are stored in Books3 as large, unlabeled blocks of text. To identify their authors and titles, I extracted ISBNs from these blocks of text and looked them up in a book database. Of the 191,000 titles I identified, 183,000 have associated author information. You can use the search tool below to look up authors in this subset and see which of their titles are included.

The article contains a search tool that anyone can use to see whether their copyrighted work was fed into the AI training process.

As an author whose works were used, I feel aggrieved. I think that all of us whose works were utilized without our knowledge or consent should be compensated.

AI is the latest iteration of big-tech’s efforts to make human beings irrelevant. AI may “learn” how to write well, but AI can never “learn” the wisdom, experiences, memories, fears, hopes, and emotions that lie behind every book.