AI has been eating the world this year, with the launch of GPT-4, DALL·E 3, Bing Chat, Gemini, and dozens of other AI models and tools capable of generating text and images from a simple written prompt. To train these models, AI developers have relied on millions of texts and images created by real people—and some of them aren’t very happy that their work has been used without their permission. With the launches came the lawsuits. And next year, the first of them will likely go to trial.
Almost all the pending lawsuits involve copyright to some degree or another, so the tech companies behind each AI model are relying on fair use arguments for their defense, among others. In most cases, they can’t really argue that their AIs weren’t trained on the copyrighted works. Instead, many argue that scraping content from the internet to create generative content is transformative because the outputs are “new” works. While text-based plagiarism may be easier to pin down than image generators mimicking visual styles of specific artists, the sheer scope of generative AI tools has created massive legal messes that will be playing out in 2024 and beyond.
In January, Getty Images filed a lawsuit against Stability AI (the makers of Stable Diffusion) seeking unspecified damages, alleging that the generative image model was unlawfully trained using millions of copyrighted images from stock photo giant’s catalog. Although Getty has also filed a similar suit in Delaware, this week, a judge ruled that the lawsuit can go to trial in the UK. A date has not been set. For what it’s worth, the examples Getty uses showing Stable Diffusion adding a weird, blurry, Getty-like watermark to some of its outputs are hilariously damning.)
A group of visual artists is currently suing Stability AI, Midjourney, DeviantArt, and Runway AI for copyright infringement by using their works to train their AI models. According to the lawsuit filed in San Francisco, the models can create images that match their distinct styles when the artists’ names are entered as part of a prompt. A judge largely dismissed an earlier version of the suit as two of the artists involved had not registered their copyright with the US copyright office, but gave the plaintiffs permission to refile—which they did in November. We will likely see next year if the amended suit can continue.
Writers’ trade group the Authors Guild has sued OpenAI (the makers of ChatGPT, GPT-4, and DALL·E 3) on behalf John Grisham, George R. R. Martin, George Saunders, and 14 other writers, for unlawfully using their work to train its large language models (LLMs). The plaintiffs argue that because the ChatGPT can accurately summarize their works, the copyrighted full texts must be somewhere in the training database. The proposed class-action lawsuit filed in New York in September also argues that some of the training data may have come from pirate websites—although a similar lawsuit brought by Sarah Silverman against Meta was largely dismissed in November. They are seeking damages and injunction preventing their works being used again without license. As yet, no judge has ruled on the case but we should know more in the coming months.
And it’s not just artists and authors. Three music publishers—Universal Music, Concord, and ABKCO—are suing Anthropic (makers of Claude) for illegally scraping their musicians’ song lyrics to train its models. According to the lawsuit filed in Tennessee, Claude can both quote the copyrighted lyrics when asked for them and incorporate them verbatim into compositions it claims to be its own. The suit was only filed in October, so don’t expect a court date before the end of the year—though Anthropic will likely try to get the case dismissed.
In perhaps the most eclectic case, a proposed class-action lawsuit is being brought against Google for misuse of personal information and copyright infringement by eight anonymous plaintiffs, including two minors. According to the lawsuit filed in San Francisco in July, among the content the plaintiffs allege that Google misused are books, photos from dating websites, Spotify playlists, and TikTok videos. Unsurprisingly, Google is fighting it hard and has moved to dismiss the case. As they filed that motion back in October, we may know before the end of the year if the case will continue.
[ Related: “Google stole data from millions of people to train AI, lawsuit says” ]
Next year, it looks like we could finally see some of these lawsuits go to trial and get some kind of ruling over the legality (or illegality) of using copyrighted materials scraped from the internet to train AI models. Most of the plaintiffs are seeking damages for their works being used without license, although some—like the Authors Guild—are also seeking an injunction that would prevent AI makers from continuing to use models trained on the copyrighted works. If that was upheld, any AI trained on the relevant data would have to cease operating and be trained on a new dataset without it.
Of course, the lawsuits could all settle, they could run longer, and they could even be dismissed out of hand. And whatever any judge does rule, we can presumably expect to see various appeal attempts. While all these lawsuits are pending, generative AI models are being used by more and more people, and are continuing to be developed and released. Even if a judge declares generative AI makers’ behavior a gross breach of copyright law and fines them millions of dollars, given how hesitant US courts have been to ban tech products for copyright or patent infringement, it seems unlikely that they are going cram this genie back in the bottle.