Meta claims that downloading copyrighted data is not a copyright violation

In a 2023 lawsuit filed by several American authors against OpenAI and Meta for copyright infringement, Meta admitted to having downloaded pirated book datasets via torrent for the purpose of training its AI. However, Meta argued in court that there was no evidence that it had shared the files after downloading them.
Meta Says it Made Sure Not to Seed Any Pirated Books * TorrentFreak
Meta claims torrenting pirated books isn't illegal without proof of seeding - Ars Technica
https://arstechnica.com/tech-policy/2025/02/meta-defends-its-vast-book-torrenting-were-just-a-leech-no-proof-of-seeding/
In 2023, three authors, including comedian and author Sarah Silverman, sued OpenAI and Meta, alleging that ChatGPT and Llama were trained on datasets whose works were illegally distributed on the Internet.
OpenAI and Meta sued by three authors for copyright infringement - GIGAZINE

Evidence was presented during the trial that Meta was trained using approximately 82TB of data stored in pirate e-book libraries such as Z-Library and Anna's Archive.
Meta CEO Mark Zuckerberg is being pursued in a lawsuit for allowing the AI 'Llama' development team to use copyrighted works without permission - GIGAZINE

The training dataset was allegedly downloaded internally by Meta using BitTorrent, and the plaintiffs allege that the act of downloading the data using BitTorrent is itself a violation of California's Computer Data Access Fraud Act (CDAFA).
Meta countered that BitTorrent itself is a widely used protocol for downloading large files and is not an illegal technology. They said that the data download using BitTorrent in this case was simply a move to access data from a 'famous online repository' that was publicly available via BitTorrent, and that the act of downloading itself is not illegal.
Meta also maintains that it 'did download the data, but has no evidence of 'seeding' the data it downloaded, and denies it.'

Ars Technica, an IT news site, points out that Meta has deliberately avoided using the word 'pirate,' and instead has tried to redefine its actions as being within the legal realm by using terms such as 'publicly accessible datasets,' 'publicly readable text from books,' and 'publicly accessible websites not operated or owned by the plaintiffs.'
However, Meta's project manager Michael Clarke acknowledged that the company had changed its settings to minimize the number of seeds as much as possible. This did not prevent seeding altogether, but it is possible that some seeding still occurred. The plaintiffs argued that 'by using BitTorrent, Meta made pirated data available to users around the world.'
Moreover, Meta's internal messages reveal that the company downloaded the dataset without using Facebook's servers, which the plaintiffs argue was a deliberate attempt to avoid tracking risks.
Ultimately, the plaintiffs allege that Meta not only lost licensing revenue by using BitTorrent to 'obtain works from pirate databases,' but also lost additional revenue by 'making pirated versions of the works available for download by Internet users around the world in the course of obtaining data for its AI training.'
Related Posts:
in Note, Posted by log1i_yk