Japanese companies can use any material for commercial use to train generative AI models.
In a first, Japan asserted that it will not enforce copyright claims involving the use of training data for generative AI programs. Liberal Democrat minister Keiko Nagoaka stated in a committee meeting that all kinds of data are fair game and AI companies can take content from any source to use for “information analysis.” This means generative AI programs are free to use whatever data they please, including copyrighted material such as anime, manga, literature, art, code, photographs, and so on.
Generative AI tools like LLMs or image generators train their algorithms on a respectable chunk of the clean internet. This includes everything from books and articles to pictures by professional photographers and art by prominent artists. This no-enforcement policy is being seen as a major blow to copyright holders as it gives them zero compensation despite their work adding value to the tool’s performance.
The problem of consent and copyright in training these AI models is quite hotly debated in Western forums, including in the EU where policymakers are looking at ways to make corporates pay. The most notable lawsuit is perhaps Getty Images vs. Stability AI currently. Japan’s stance throws all that out of the window.
Is Japan going all-in by allowing these models to be trained on heaps of copyrighted material for commercial gains? Keiko Nagaoka is the Japanese Minister of Education, Culture, Sports, Science, and Technology. It’s speculated that her affirmation stems from the potential obstacle that copyright laws create in the country’s progress in AI technology. Japan’s copyright laws are there to protect its soft power or creative industries—Mainly anime and manga.
It’s noteworthy that Japan is a part of the G7, a political forum that recently considered governance and regulation of generative AI through the “Hiroshima AI Process.”
On one hand, this infuriates creators of all kinds from writers and painters to digital artists and designers. And on the other, it helps companies create generative AI tools without limits on what they can use to train their models. The people on this side of the fence popularly argue that this kind of model training aggregation is a lossy process where information is not directly used or retained, per se. The model, on its own, is derived work and therefore, it’s fair use.
But the problem is that this model can be used to produce work that infringes copyright severely. For example, using such a model to essentially clone someone’s code or create images in a photographer’s or artist’s signature style are problems that this thought process fails to solve.
There are two extremes in this debate:
- Disallowing models to be trained on copyrighted material.
- Making it all fair game and only prosecuting the intentional sale or distribution of copyright-violating work.
And the generally accepted regulation will be somewhere in the middle, depending on the particular use case. But for now, Japan has taken an extreme stance and only time will tell what it helps achieve.