OpenAI says it’s impossible to work with just public domain work, and the company will be doomed if it doesn’t train its models using copyrighted work.
OpenAI is tangled in a web of lawsuits, one of them unfolding in the British Parliament. There, the company pleads that it’s impossible to train its models using only content available in the public domain (out of copyright). Though such material can be a good experiment, OpenAI says in a filing to the House of Lords subcommittee that it’s not practical for it to train its best models without using copyrighted material.
The filing says, “Because copyright today covers virtually every sort of human expression — including blog posts, photographs, forum posts, scraps of software code, and government documents — it would be impossible to train today’s leading AI models without using copyrighted materials.”
The ban on news and books to train AI models behind chatbots like ChatGPT, which the British Parliament is seeking, would doom the development of artificial intelligence according to OpenAI and it would not be sufficient to deliver the tools that the users need or the capabilities that people expect from modern AI. The Telegraph first reported this (link – paywalled).
In the UK, the copyright ends and the work (whether it’s a novel or a piece of art) enters the public domain after 70 years of the death of the creator or author.
That would be quite the tragedy, say the artists whose work is being used to train AI LLMs without permission or choice.