Reddit blocks Internet Archive to stop AI companies from exploiting the Wayback Machine archive



The Internet Archive operates the Wayback Machine , which archives all content on the internet, including content from the social message board Reddit. However, it has been discovered that a company is using Reddit's content, which prohibits scraping , to train AI via the Wayback Machine's archives. Reddit has begun blocking the Wayback Machine from archiving content.

Reddit blocks Internet Archive to end sneaky AI scraping - Ars Technica
https://arstechnica.com/tech-policy/2025/08/reddit-blocks-internet-archive-to-end-sneaky-ai-scraping/



Reddit will block the Internet Archive | The Verge
https://www.theverge.com/news/757538/reddit-internet-archive-wayback-machine-block-limit

As part of its mission to archive all content on the internet, the Wayback Machine has archived Reddit pages, profiles, and comments, but now it will only archive screenshots from Reddit, Ars Technica reports.

Reddit has not revealed the names of the AI companies that were scraping data from the Wayback Machine, but company spokesperson Tim Russchmidt told Ars Technica, 'Reddit is aware of instances in which AI companies are violating platform policies (including Reddit's own policies) by scraping data from the Wayback Machine.'

Russ Schmidt suggested that the Internet Archive needs to take steps to strengthen its defenses against AI scraping, saying, 'Until the Internet Archive can protect its site and comply with the platform's policies (such as respecting user privacy and removing deleted content), we will partially restrict the Internet Archive's access to Reddit data to protect Reddit users.'

Some Reddit users are using the Wayback Machine to look up deleted posts and comments, Ars Technica points out, adding that there are countless other tools available for viewing deleted posts and comments, and that the Wayback Machine is not the right platform for such purposes.



The Internet Archive has not commented on whether it is considering making changes to remove the block from Reddit. Ars Technica has asked Reddit how this change would affect the archive's usefulness as an open web resource, but has not received a response at the time of writing.

Meanwhile, Mark Graham, director of the Wayback Machine, told Ars Technica, 'The Internet Archive has a long-standing relationship with Reddit and is in ongoing discussions about this matter.'

Ars Technica reports, 'Reddit's move to restrict AI companies' use of the Wayback Machine archive is likely financially motivated. It likely seeks to encourage more favorable licensing deals, like those Reddit has signed with OpenAI and Google . While the terms of the OpenAI deal haven't been made public, the Google deal is reportedly worth $60 million. Over the next three years, Reddit is expected to make more than $200 million from these licensing deals.'

In fact, Reddit sued Anthropic, alleging that the company used Reddit data to train its AI without a licensing agreement.

Reddit sues Anthropic, alleging it used its data to train AI models without a license agreement - GIGAZINE


by Alpha Photo

in Web Service, Posted by logu_ii