NY Times Sues OpenAI for Copyright Infringement

The Times joins the fight against OpenAI’s LLMs using publishers’ work without permission and publicly featurng it to its users with a fresh lawsuit.

A lawsuit filed against OpenAI and Microsoft by the New York Times states that millions of articles from the publisher were “used to train chatbots that now compete with it.” Chatbots like ChatGPT use large-language models or LLMs that are trained on heaps of data. As websites like the New York Times are open to the internet, its articles were reportedly used as part of the training dataset to make it smarter. The chatbot, now, is competing with NY Times itself.

The lawsuit alleges that ChatGPT will sometimes give “verbatim excerpts” from New York Times articles when asked about current events. What this means is that ChatGPT users are getting parts of articles from the New York Times itself without accessing the platform, meaning no advertising revenue, traffic, and subscription value to the publisher.

The lawsuit is against Microsoft as well, not just the ChatGPT parent OpenAI. Microsoft’s Bing search engine is also powered by the same LLM behind ChatGPT, and in an example, gives results from a New York Times-owned website without referring to the source or using the referral link.

New York Times, in its filing, also explains how the publisher was unsuccessful in approaching the companies to seek an amicable resolution over the copyright issue.

Notably, Game of Thrones author George RR Martin, along with other US authors, brought a similar copyright infringement lawsuit against OpenAI. There have been other instances of similar cases as well.

To this, OpenAI commented:

We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models. Our ongoing conversations with the New York Times have been productive and moving forward constructively, so we are surprised and disappointed with this development. We’re hopeful that we will find a mutually beneficial way to work together, as we are doing with many other publishers.
Lindsey Held, OpenAI

The AI duo of Microsoft and OpenAI claims that the work on Times is fair use for a “transformative purpose.” Copyright doesn’t fully allow when it comes to using work as an inspiration or when transforming work into something else, in their defense.

To this, the Times said:

There is nothing ‘transformative’ about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it. Because the outputs of Defendants’ GenAI models compete with and closely mimic the inputs used to train them, copying Times works for that purpose is not fair use.
New York Times complaint

CNN, in its report on the lawsuit, mentions how they added code to the website blocking the web crawler of OpenAI called GPTBot. The article in the New York Times itself mentioned that this lawsuit “could test the emerging legal contours of generative A.I. technologies — so called for the text, images and other content they can create after learning from large data sets — and could carry major implications for the news industry.” BBC, in its coverage of the Times lawsuit, notes that though there have been many similar lawsuits, “None of these lawsuits have yet been resolved.”

By Abhimanyu