Sarah Silverman and other authors sue OpenAI and Meta for copyright infringement

Since its rapid rise in popularity, many artists, creators, and observers have lambasted AI-generated content as derivative, morally ambiguous, and potentially harmful. Considering that, specifically, the text-generating large language models (LLMs) are trained on existing material, it was only a matter of time until the pushback has entered this next phase.

Three recent class-action lawsuits were filed in California within days of each other—this time on behalf of writers including comedian Sarah Silverman. The lawsuits–Silverman, Golden, and Kadrey v Meta, Silverman, Golden, and Kadrey v OpenAI and Tremblay and Caden v OpenAI–accuse OpenAI and Meta of copyright infringement via their LLM systems ChatGPT and LLaMA, respectively.

As reported over the weekend by The Verge and others, attorneys at Joseph Saveri Law Firm claim that both ChatGPT’s and LLaMA’s underlying technologies generate content that “remix[es] the copyrighted works of thousands of book authors—and many others—without consent, compensation, or credit.”

According to a US District Court filing against OpenAI, the plaintiffs’ lawyers offer multiple examples pulled from GPT-3.5 and GPT-4 training datasets highlighting copyrighted texts culled from “flagrantly illegal” online repositories such as Library Genesis and Z-Library. Often referred to as “shadow libraries,” these websites offer millions of books, scholarly articles, and other texts as eBook files for users, often without the consent of authors or publishers. In the case of Saveri Law Firm’s filing against Meta, a papertrail traces some of LLaMA’s datasets to a similar shadow library called Bibliotek.

“Since the release of OpenAI’s ChatGPT system in March 2023, we’ve been hearing from writers, authors, and publishers who are concerned about its uncanny ability to generate text similar to that found in copyrighted textual materials, including thousands of books,” argue the plaintiff attorneys in their litigation announcement. “‘Generative artificial intelligence’ is just human intelligence, repackaged and divorced from its creators.”

Companies such as OpenAI and Meta are facing mounting legal challenges to both the source material behind training their headline-grabbing AI systems as well as their products’ propensity to inaccurate and potentially dangerous results. Last month, a radio host sued OpenAI after ChatGPT results incorrectly claimed he was previously accused of embezzlement and fraud.

Although the company started as a nonprofit by Elon Musk and Sam Altman in 2015, OpenAI later opened a for-profit subsidiary in 2019 shortly after the former’s departure from the company. Earlier this year, Microsoft announced a multibillion dollar investment in OpenAI ahead of its release of a ChatGPT-integrated Bing search engine.

Each lawsuit includes six counts of “various types of copyright violations, negligence, unjust enrichment, and unfair competition,” notes The Verge. Additional plaintiffs in both lawsuits include the bestselling authors Paul Tremblay (The Cabin at the End of the World, A Head Full of Ghosts), Mona Awad (Bunny, All’s Well), Christopher Golden (Ararat), and Richard Kadrey (Sandman Slim). The lawsuits’ plaintiffs ask for restitution of profits, statutory damages, among other penalties.