Several authors have filed suit against Meta (Zuckerberg), Bloomberg, and other tech corporations for violating the copyright on their books. Alex Reisner has written three articles in The Atlantic about how the developers of Artificial Intelligence (AI) have used 183,000 books to train AI how to write.
Two of those 183,000 books are mine: The Death and Life of the Great American School System: How Testing and Choice Are Undermining Education and Reign of Error: The Hoax of the Privatization Movement and the Danger to America’s Public Schools. As an author, I am outraged that huge tech corporations used my books as training fodder for their profiteering.
Reissner writes:
This summer, I acquired a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. I wrote in The Atlantic about how the data set, known as “Books3,” was based on a collection of pirated ebooks, most of them published in the past 20 years. Since then, I’ve done a deep analysis of what’s actually in the data set, which is now at the center of several lawsuits brought against Meta by writers such as Sarah Silverman, Michael Chabon, and Paul Tremblay, who claim that its use in training generative AI amounts to copyright infringement.
Since my article appeared, I’ve heard from several authors wanting to know if their work is in Books3. In almost all cases, the answer has been yes. These authors spent years thinking, researching, imagining, and writing, and had no idea that their books were being used to train machines that could one day replace them. Meanwhile, the people building and training these machines stand to profit enormously.
Reached for comment, a spokesperson for Meta did not directly answer questions about the use of pirated books to train LLaMA, the company’s generative-AI product. Instead, she pointed me to a court filing from last weekrelated to the Silverman lawsuit, in which lawyers for Meta argue that the case should be dismissed in part because neither the LLaMA model nor its outputs are “substantially similar” to the authors’ books.
It may be beyond the scope of copyright law to address the harms being done to authors by generative AI, and the point remains that AI-training practices are secretive and fundamentally nonconsensual. Very few people understand exactly how these programs are developed, even as such initiatives threaten to upend the world as we know it. Books are stored in Books3 as large, unlabeled blocks of text. To identify their authors and titles, I extracted ISBNs from these blocks of text and looked them up in a book database. Of the 191,000 titles I identified, 183,000 have associated author information. You can use the search tool below to look up authors in this subset and see which of their titles are included.
The article contains a search tool that anyone can use to see whether their copyrighted work was fed into the AI training process.
As an author whose works were used, I feel aggrieved. I think that all of us whose works were utilized without our knowledge or consent should be compensated.
AI is the latest iteration of big-tech’s efforts to make human beings irrelevant. AI may “learn” how to write well, but AI can never “learn” the wisdom, experiences, memories, fears, hopes, and emotions that lie behind every book.

It is already the case that much popular entertainment, such as films and pop songs, is entirely written to formula. The formula for a typical Hollywood film is so precise that it predicts what is going on at any point in a film. To learn about the film formula, see this truly outstanding treatment:
LikeLike
Let me try to be clearer about that: Hollywood (and copycat) films are structured in so predictable a way that an experienced professional can tell you just what is happening 5 minutes in, 10 minutes in, 20 minutes in, and so on. They almost all function in the same way, following the same set structural formula.
LikeLike
Brilliant piece of journalism. All the writers who have been plagiarized should sue Meta. Sue them Diane.
LikeLike
So, we already have almost no originality of structure in popular film and song. But we also have almost no originality of content:
LikeLike
So, we are already almost there. Soon, no human will be writing any of this stuff. Films, pop songs, novels–all will be AI-generated?
Are you good with that?
I’m not.
LikeLike
Not good with that at all. Art isn’t Art when it doesn’t break new ground.
LikeLiked by 1 person
So, can you think of a time when people invented some technology and then said, “Oh, this is just so awful, so destructive, that we should just stop it by law?”
I can’t.
But perhaps it’s time we figured out that we need to do that. Before it is too late for the amusingly named H. sapiens.
LikeLike
Well, Bob, historically, the standardization of creative products provokes rebellion and absolutely new-under-the-sun creative forms. Going back to antiquity. Basing that on my fave new read. We’re doing it in my mini-zoom class [2 American students plus Argentinian prof studying LatAm/ Span lit weekly, for 3-1/2 yrs now! (product of pandemic)]. “El Infinito en un Junco,” Irene Valleco (de España). English translation is called “Papyrus: The Invention of Books in the Ancient World.” Highly recommend it.
LikeLike
Well, the decline a magazines and of short stories has indeed been accompanied by a creative revolution.
If you call TikToks about farting in cars creative.
LikeLike
But I would like to believe that you are right, Ginny! And that book sounds like fun!
LikeLike
Wait, I’m sure I must have read a few Mad Magazines featuring farting in cars… 😃
LikeLike
For most of my life, I have made my living as a writer. But now, . . .
I’m about to go the way of typewriters and slide projectors.
But, again, this was already happening. In the mid-twentieth century, most fiction writers supported themselves, between novels, by churning out short stories, and a LOT of magazines published stories–a dare say most. So, someone like Kurt Vonnegut supported himself, between novels, by publishing short stories in magazines.
Now, very, very few magazines publish short stories. Sure, there are little literary magazines that do, but they do not pay writers. It’s supposed to be an honor just to appear in them. So, how, exactly, is a writer to make a living?
This affects me directly because the short story is my preferred medium for creative writing. It’s the one I do most of my work in. I LOVE the medium. And I love reading short stories. But I am in a minority now. That saddens me a lot.
LikeLike
I can’t even get my book club to consider short stories anymore! The core group has been together for over 30 yrs. We used to do a short-story collection occasionally, but about 10 yrs ago consensus was, too hard to focus discussion around multiple stories [🤬]. Since then, I’ve been able to talk them into two that had interlocking stories [same characters/ themes], but the only one that passed muster was a novel comprising 3 stories of 3 families in the same small Israeli apt bldg. [“Three Floors Up,” Eshkol Nevo].
This is getting me hot under the collar: the masters of short stories we are missing! Every Jan I run a poetry session for them (by popular demand): handout with annotated selections, suggested questions for discussion. Maybe I could offer to do something similar with short stories…
LikeLiked by 1 person
❤
LikeLike
The corporate machine hates nothing more than having to pay for human labor.
In other words, what else is new?
LikeLike
Interestingly, it wants not to pay laborers but to have laborers buy its products.
LikeLike
If a person writes a story and no one reads it, does it make a sound?
Yes, it makes the sound of the long, slow exhalation of the writer’s exasperated and soon to be extinguished breath.
Go read a short story today.
Or watch the latest TikTok about farting in cars. Yeah. That’s what America really wants.
LikeLike
Bob,
Do you like Steven Millhauser? My husband loves him (his writing), and we see him in the grocery store on and off. We always joke that everyone in the store shopping has no idea that there’s a Pulitzer Prize winning author buying citrus in the produce aisle. 🙂 🙂 🙂
LikeLike
He taught at my alma mater for a long time. Unfortunately after I graduated. 😦
LikeLike
I am a total sucker for what I call “idea fiction,” as opposed to “character fiction,”–short stories that explore a single odd and profound idea, often from different vantage points. This is the kind of fiction that I write, as well. One encounters or is gifted an unusual idea, and this leads to a “what if” that has to be explored in a story! As in a lot of Millhauser, such storytelling tends to be sci-fi or sci-fi adjacent, though Millhauser does more fantasy than I do. Ted Chiang and Ken Liu also write such short stories, as do Calvino and Borges.
LikeLike
Lucky you to encounter, occasionally, such a rare creature in the wild!!! Millhauser is a very great writer. He is superb in both the long and short of stories–in the inventiveness and profundity of his overall ideas and in the exquisite craftsmanship of individual sentences.
LikeLike
By design, contemporary software labeled as “Artificial Intelligence” (“AI”) is neither intelligent nor writing, as our society have come to understand these terms. It produces text output, not writing. A more accurate description would be “simulated”: the results resemble written output created by human minds, but the process used to achieve these results is nothing like the process we use to develop written work.
artificial:
• made by people, often as a copy of something natural
– Cambridge Dictionary https://dictionary.cambridge.org/us/dictionary/english/artificial
Computer-generated output that models the physical world has come to be referred to as “virtual”:
virtual:
• very close to being something without actually being it – Britannica Dictionary https://www.britannica.com/dictionary/virtual
* the quality of affecting something without actually being that something
– Encyclopedia of Networked and Virtual Organizations https://www.igi-global.com/book/encyclopedia-networked-virtual-organizations/369
By definition, artificial intelligence is copying something. Some artificial products can be beneficial, like an artificial heart, where the only concern is that the result simulate the original. In the case of artificial intelligence however, the process in deriving the result is a major consideration. While computers can now imitate some of the results, duplicating the processes of the human brain is way beyond the current state of computer science.
LikeLike
Nothing like. It’s all about, given the prompt and anything already said, what is the most likely next word?
However, it doesn’t matter because people are too freaking dumb to tell the difference between the AI-generated crap and actual writing. Could the likes of ChatGPT write skits for Saturday Night Live? The next Billy Elish song?
Uh, yeah. Unfortunately.
LikeLike
I generally admire your intellect & wit, Bob. You always give me something to think about. Don’t understand why you say, “Nothing like,” & then follow with, “…given the prompt and anything already said, what is the most likely next word?” which sounds to me like a description of the copying/simulation process I referred to. It sounds like we agree about what’s currently referred to as AI.
Sorry to see you’re so cynical about the human race. On the other hand, I haven’t had a career in public education or any related field, & had to deal with the difficulties you’ve faced. I have gotten some perspective from my wife, a retired NYC school social worker. Forums like Diane’s, including your valuable contributions, are what give me hope for the world.
LikeLike
…or was “nothing like” agreeing with my closing statement that the processes of current “AI” & human intelligence are very different? …in which case, oops!😬
LikeLike
Lenny, I was agreeing with you!
LikeLike
Sorry that I was not clear about that! I very much appreciated your comment, Lenny!
LikeLike
In technical terms, ChatGPT et al work as Markov processes, whereas actual language is planned, then stated and highly recursive. The ChatGPT stuff only looks that way.
LikeLike
I realized you were actually agreeing about 5 seconds after I posted, taking another look. If WordPress had a comment edit feature, I would have fixed it. I sometimes misinterpret things online that I wouldn’t in regular conversation (remember regular conversation?). That’s the problem! It’s due to the limitations of online communication & WordPress! Like the old saying goes, “It’s a smart workman who blames his tools.”
…wait… is that right?… ummmm… Yeah. Yeah, that’s it! See? AI opens up to us a whole new vista for evading responsibility!
LikeLiked by 1 person
We all do that, Lenny! No problema! And yeah, one really misses facial expression, gestures, tone of voice, all those paralinguistic cues.
LikeLike
And, ROFL!
LikeLike
Lenny—OTOH, we here have gotten rather good at interpreting each other’s written communications. As you demonstrate with yr 5-sec realization of what Bob probably meant. We have come to know each other, gradually, through written communication. As did people of yore who communicated regularly, sometimes for most of their lives, only by snail mail.
In my quest to understand people very different from myself, I have been frequenting a news platform with comment threads to every article. I started there because periodically articles would appear from WSJ, the Atlantic, et al paywall media I could read for free. I stayed because over half the commenters are opposite my political leanings, including many libertarian-leaners and outright MAGA types. And the platform facilitates actual conversations. If someone replies to you, it shows up promptly & one can navigate quickly back to that article & respond.
I’m starting to understand the way they read/ interpret (articles, and esp other comments). What I’m really trying to get to is how they think— often not reflected in how they write. But a neutral tone & questions can often unearth that, & conversation becomes more productive.
LikeLike
I often read the rightwing press for the same reasons.
LikeLike
Hi Bethree — Yeah, I guess we are generally OK understanding each other here. It did take me a minute from when I started writing to catch on that Bob said exactly the opposite of what I’d thought he said. Sometimes I’m a little slow. Does this blog have a remedial class?
My compliments on your progress with understanding right-oriented points of view. I’m sorry to say that my skill in that area — or maybe my assessment of my skill, or maybe both — has regressed over the past couple of years. I used to think I could understand communicate with them fairly well, even if I disagreed with the views. I now think I don’t get it at all. On one hand, it’s distressing; on the other hand, I no longer have any idea of how to approach it.
Historically, political conversations have frequently been very heated. In the past (say, maybe 25+ years ago), disagreements concerned how to address an issue, not the nature of the issue, or whether the issue even existed at all. In the 60s, Vietnam was a hotly disputed topic. But the issue was whether, & to what extent, US military should be involved in that country’s affairs. Nobody said, “US military is not in Vietnam, & there’s not even a country called Vietnam. The whole thing is a hoax. Those images on TV were produced in a movie studio.” That’s the level of difference we have today with political positions — people actually inhabit 2 totally different realities. I find it impossible to have meaningful communication when there’s no agreement on the nature of reality, & don’t know where to start in re-establishing constructive dialogue.
In order to justify moving apparently so far off-topic: AI in current widespread use exacerbates the already serious crisis of people perceiving different realities. The publicly-available AI platforms covered in the news are known to have presented, in authoritative language, fabricated statements as fact, & even cited non-existent sources (if they cite sources at all), or attributed to real individuals things they never said or wrote. As with any new tech, there’s a lag in regulation — legally enforced or voluntary — & common sense catching up. There will be some chaos during that period, but hopefully it’ll resolve as it has in the past.
LikeLike
Nobody said, “US military is not in Vietnam, & there’s not even a country called Vietnam.”
YES! This is the stuff that Tsar Vladimir Putin the Demented, the ex-KGB guy, introduced via his asset, Oh Donnie Boy.
Alternative fact universes
Disinformation
LikeLike
Lenny, that’s a brilliant insight. It brings to mind the innovative concept of “alternative facts.”
LikeLike
Thank you, Diane!
In order for American society to move forward, something has to break this standoff on what reality we live In. There have always been fringe conspiracy theories, but until relatively recently, they were distributed through the US mail on hand-photocopied paper, with a distribution of maybe 30-200 readers. The Internet has given those who circulate these views the ability to reach millions with less effort than it takes to stuff a dozen envelopes. AI can either increase that effect or, with responsible management, reverse it.
Though Mr. T didn’t start it, he poured very potent fuel on the fire. A lot of it today is related to his influence, to the point that now those beliefs are almost synonymous with his supporters. The good news is that as it grew with his popularity, it may also follow his decline. I’m hoping that as the trials progress & more information about his activities comes out, at least some of the independents who’ve supported him will re-evaluate.
LikeLike
Lenny– Hm, you’ve pinpointed a very common problem. I pretty much begin engagement with correction of misinformation (without adding “, you moron” as so many do 😁). And only a few actually come back with, “well then how come…” etc.
I won’t bore you with TMI family anecdotes, but suffice it to say I got plenty of practice growing up.
LikeLike
AI is a form of cyber plagiarism in which profiteers are using algorithms to imitate style and copyrighted content of authors without permission. Unfortunately, the laws have not kept pace with technology.
LikeLike
yup
LikeLike
Perfect term: cyber plagiarism
LikeLike
Oh, and hey boys and girls, Master of the Universe Billy Boy Gates has ANOTHER BRILLIANT IDEA FOR REVOLUTIONIZING EDUCATION!!!! That’s right, he still has more damage to do!!!!
Replace reading teachers with Clippy the Paperclip! Uh, ChatGPT:
https://www.cnbc.com/2023/04/22/bill-gates-ai-chatbots-will-teach-kids-how-to-read-within-18-months.html
We’re doomed.
LikeLike
Crap! I thought I’d be dead & gone before this type of Brave New World set in. Not that I’m anxious for my early demise, but living in this type of chaos was not what I expected in my retirement years. All we can do is play whack-a-mole in dealing with problems like climate change, Ai, and other emerging crises.
LikeLike
D, sue them. They used your work without permission and its copyrighted.
LikeLike
Amen
LikeLike
Here’s an interesting piece I read early this morning.
Three Ways Writers Use AI (Without Having it Write for them)
https://www.writtenwordmedia.com/three-ways-writers-can-use-ai-without-having-it-write-for-them/
LikeLike
Outrageous!!!
Bobbi Eisenberg
LikeLike
Wholesale robbery on a grand scale. “Pirated” is not a big enough word: the usual context is random individuals robbing author/ publisher of some sales. Here we have a nascent, slated-to-be highly lucrative industry, built directly on data banks containing the words of hundreds of thousands of pirated titles.
Shades of the music industry. Here I don’t know what I’m talking about, just conjecturing that music services for which millions pay nominal monthly fees are built on [pirated?] compositions/ productions, returning little if anything to creators. I’ll have to research that, but there’s no question that the current business model has turned the music industry on its ear, & I’m betting 99.9999…% of profit is going to the streaming services.
LikeLike
What is the difference between using a book to train AI machines and using it in a college or high school course to illustrate something? Besides telling the students “We are now going to read from The old man and the sea by Hemingway”, do teachers need to get a permission from the author and describe how the book is going to be used in the course?
I am just not clear on where copyright infringement begins.
My math papers certainly can be used for any purpose, as long as it is acknowledged as a source. I’d like to be able to control whether they are used for, say, military purposes, but I believe I can’t.
LikeLike
besides telling the students “We are now going to read from The old man and the sea by Hemingway”
This is a BIG difference because knowing the source of the shared material, the student might then go out a buy and read a copy, which makes a huge difference to living writers. Book sales are often driven by word of mouth. And sometimes a sale might come years later. Oh, I remember this guy. Professor Wierdl read a passage from him.
LikeLike
What if I buy a copy of a book and then use it in my reasearch? So if this AI company bought these 200K books then all would have been well?
LikeLike
They didn’t just use them in their research, they incorporated characteristics of them into their algorithm. So, for example, as I understand it, they asked, given the portion of a huge corpus of written material that consists of writing about AI writing bots, what is the probability that, given the words “characteristics of their,” the next word will be algorithm? And the whole of a piece of generated writing comes from following these weights.
LikeLike
If I buy a copy of Old Man and the Sea, I can’t then publish online my novella about an old guy battling an enormous fish that is eaten by sharks and call it “An Elderly Guy and the Ocean.”
But I don’t really care. I would love to see anything that can kill these damned algorithms before they render writers obsolete. Lord knows that it is difficult enough for writers to make a living now. A few celebrity writers of lowbrow pulp fiction get fabulously wealthy, and the rest find a day job.
LikeLike
Teachers have wide leave under copyright law to excerpt materials for instructional purposes, but even this is limited. One cannot simply reproduce and distribute, in course packets, for example, whole works still under copyright. Sharing a passage from The Old Man and the Sea to illustrate a point, fine. Photocopying The Old Man and the Sea, nope.
“I am just not clear on where copyright infringement begins.”
Yeah, that’s a problem–one that keeps lawyers busy, busy–because the law is vague. Here, a brief overview of Fair Use:
https://support.google.com/legal/answer/4558992?hl=en
So, under the “amount and substantiability” part of that test, quoting a paragraph from Old Man and the Sea is OK for anyone, even outside instruction, criticism, or parody. However, the courts have ruled again and again that ANY PART of a song or a poem that is in copyright is a substantial part of it, and use of it without permission lessens the value or steals the value of the copyright. So, I can get away with quoting a Dylan song in a parody or a work of cultural or political critique, but if I just quote it because I like it, that’s a violation of copyright. As I often say of Reichwingers and their ideas about sex and gender, “There’s something happening here, and you don’t know what it is, do you, Mr. Jones?”
LikeLike
Mate, have you seen the film Traveling Salesman, P = NP? Wondering what you think of it.
LikeLike
“Very few people understand exactly how these programs are developed, even as such initiatives threaten to upend the world as we know it. ”
This of course is a problem. The Free Software Foundation has a copyright (they call it copyleft) scheme which says that their code and software can be used for any purpose , and for that it can be changed, expanded, etc, as long as the resulting product’s code and documentation are also freely available.
LikeLike
https://en.wikipedia.org/wiki/Travelling_Salesman_(2012_film)
LikeLike
Yeah, so that’s the issue. The next AI research will again use 200K books but this time they will buy the books, so nobody can sue them. The basic problem remains: the uncontrolled development of AI. As a minimum, any AI product needs to undergo years of security testing before releasing it in the wild.
Of course, the Jurassic Park and World films (and probably many other films) are also about the need to control what is researched.
LikeLike
And what about genetic algorithms that write themselves? Here, my short story about that:
LikeLike
Yeah, there comes the difficult general question of controlling scientific research— certainly its products but also, perhaps, what is researched, that is, controlling scientific inquiry. There is no doubt in my mind that AI research shouldn’t be done by private companies unless their work is completely transparent.
LikeLiked by 1 person
Totally with you, Mate.
LikeLike
Ever since Mary Shelly invented the science fiction genre, with her Frankenstein (at the age of 18!!!!), this hubris has been its most common theme.
LikeLike
The brass balls to think they can use anyone’s copyrighted material for their personal profit, without first consulting the author.
Wondering if any of my musical pieces have been fed to the monster.
My father in law/good friend passed away recently. He was a published and well respected poet.
My brother in law checked out a memorial, using ChatGPT. It was pretty good.
Then we checked out a ChatGPT poem in Barry’s style. It was pathetically child like.
One of the creators (Altman, I believe) said in an interview that we shouldn’t be worried about AI taking jobs. “Just the service jobs”.
Well, Sam: we happen to now be a service economy. So that ain’t a “just…” to a lotta people.
And I’ve got enough of a brain to see exactly where AI can branch into.
Yes: it will grow as will any form of intelligence…but why? Because it’s “there”? So is the hydrogen bomb.
LikeLike
LikeLike
Who was your father-in-law?
LikeLike
https://www.skidmore.edu/retirees/memoriam/2023/barry-goldensohn.php
(👍🏻Robert’s Rule 🥂)
LikeLike
how to be human, by Barry Goldensohn:
https://www.poetryfoundation.org/poetrymagazine/browse?contentId=31737
Good stuff!!!
LikeLike
Yeah…those are some goodies.
He was a wonderful writer and many other things. Renaissance man. One of my best friends. I miss him.
Can’t seem to find this one called, “Screwing in the Back Seats of Cars”. Very timely. Working on a GM assembly line early on.
LikeLiked by 1 person
!!!!
https://www.poetryfoundation.org/poetrymagazine/browse?contentId=35908
LikeLike
Here’s to another Skidmore College professor! 🙂 🙂 🙂
LikeLiked by 1 person
“One of the creators (Altman, I believe) said in an interview that we shouldn’t be worried about AI taking jobs. “Just the service jobs”. ”
Except AI researchers explicitly aim for replacing any human activity by AI. They want to see how far they can go, and some are convinced that they can go all the way.
LikeLike