Bluesky, a social network, has published a proposal on GitHub outlining new options for users to control the scraping of their data for use in generative AI training and public archiving. The proposal aims to establish a “new standard” that would govern data scraping across platforms.
CEO Jay Graber explained that existing companies are already collecting publicly available data from websites, including Bluesky’s content. She claims that her company is trying to create a voluntary standard for responsible data sharing, similar to the robots.txt file used by websites to communicate their permissions to web crawlers.
Under the proposal, users can choose to allow or disallow the use of their Bluesky data for specific purposes, such as generative AI training, protocol bridging, bulk datasets, and web archiving. If a user indicates they do not want their data used for this purpose, companies are expected to respect that intent.
Molly White, a writer who covers Web3 and social media issues, described the proposal as “good” but noted that it relies on companies to respect users’ preferences. She pointed out that some companies have ignored existing guidelines, such as robots.txt, and expressed concerns about the effectiveness of this approach.
Source: https://techcrunch.com/2025/03/15/bluesky-users-debate-plans-around-user-data-and-ai-training