What about overseas opposition to AI trends such as 'using derivative works as datasets for AI training without permission' and 'AI-generated derivative works'?

With the advent of generative AI, anyone can easily create text and illustrations, but many of the datasets that AI learns from are collected without permission from the Internet, which has led to major debate over copyright. The Verge, an IT news site, has summarized the opinions of those involved in the uproar that has erupted in overseas fan communities over the scraping of derivative works published on the Internet.
Fanfiction writers battle AI, one scrape at a time | The Verge

In April 2025, a user named ' nyuuzyou ' of the open source AI platform Hugging Face scraped approximately 12.6 million works from the fan fiction posting site ' Archive of Our Own (AO3) ' and uploaded the dataset to Hugging Face.
The scraping by nyuuzyou was quickly spotted on Reddit's r/AO3 community, where many users expressed outrage, and the comments section of the Hugging Face dataset became abuzz, sparking a heated debate between fan fiction writers and AI advocates.

Users who defended nyuuzyou's scraping argued that 'the crawler bots of major tech companies have already scraped AO3 many times.' Opponents countered that 'those who scrape are exploiting the labor and creativity of derivative creative authors by taking advantage of their silence.'
In 2023, an AI writing support tool called ' Sudowrite ' using OpenAI's ChatGPT was released. Then, from around the latter half of 2023, the derivative work term 'Omegaverse' began to appear in the text output by Sudowrite, which sparked a debate over whether derivative works were being used to learn ChatGPT, the basis of Sudowrite. In response to this, many derivative works have launched a campaign against the generation of derivative works by AI.

Nikki , who works under the name ' infinitegalaxies ' and writes Star Wars fan fiction , particularly the 'Reylo' genre, which refers to the pairing of Rey and Kylo Ren, who appear in Episodes 7 to 9, searched to see if her work had been subject to scraping and found that more than 70 of her works had been scraped without her knowledge. This included an essay she co-wrote with 11 other writers about the threat of AI to fandom.
Nikki was so shocked and moved to tears when she saw a video showing how Sudowrite can generate a novel simply by entering character settings and plot information. Nikki, who works for a software company, had already seen the trend of AI being introduced in the workplace, but she never imagined it would have an impact on her hobby.
'Fan communities are essentially gift economies. We do it for the fun of it all, and out of goodwill. We give each other stuff, we create together as a community. We put our time, effort, heart and soul into it, and then we share it with the community. And then you just spit it out on the screen for a few seconds... who asked for that? It's disgusting,' she said.

'This is inherently theft. There is no ethical use for something built on stolen labor,' Nikki continued. 'Not only is it fundamentally based on data collected without consent, it also goes against the culture of 'gift and sharing' in fan communities.'
Nikki and others worked together online to file a request for removal under the Digital Millennium Copyright Act (DMCA) against nyuuzyou's scraping. The non-profit organization that runs AO3, the Organization for Transformative Works (OTW), also filed a request for removal. As a result, Hugging Face disabled the dataset in question on April 9th.
Meanwhile, nyuuzyou filed a counter-notice against the data removal on Hugging Face and re-uploaded the same dataset to hosting sites in Russia and China, which are less likely to comply with DMCA removal requests. When The Verge contacted nyuuzyou, he identified himself as a student and IT engineer living in Russia, and said, 'I'm not interested in derivative works, I just uploaded the data purely for research purposes.' nyuuzyou also said he was surprised that OTW took such a strong stance against his dataset, saying, 'I was hoping to have a dialogue about the consistency of the research dataset with the purpose of preservation.'
However, Alex Hanna, research director at the Distributed Artificial Intelligence Research Institute (DAIR), said in response to nyuuzyou's claims, 'This is extremely disingenuous. Why would you put all that unstructured data on the web if you're not going to use it to train language models?'
Nikki said that after hearing nyuuzyou's explanation from The Verge, she felt even more angry, saying, 'I am determined to continue to stand up against AI that invades my fan community. I will never start a fight. But if someone starts a fight with me, I will.'
Related Posts:
in Software, Web Service, Posted by log1i_yk