AI Model Training Raises Concerns Over Data Value and Internet Sustainability

The recent death of Suchir Balaji, a former OpenAI researcher, has brought attention to an under-discussed debate about the impact of AI models on data value and internet sustainability. Balaji’s concerns were evident in his essay, which criticized how AI models use data for training, leading to a decline in traffic to original sources.

AI models are trained on information from the internet, answering user questions directly and reducing visits to websites that created and verified the original data. This drains resources from content creators, potentially leading to a less accurate and rich internet. The phenomenon has been described as “Death by LLM” (Large Language Model).

A study analyzing Stack Overflow found that traffic declined by 12% after the release of ChatGPT, with an increase in average account age suggesting fewer people signed up or left the online community. This suggests AI models could undermine incentives for creating high-quality online data.

Tech reviewer Marques Brownlee expressed similar concerns when reviewing OpenAI’s Sora video model, finding a plant that resembled one from his own videos posted on YouTube. He wanted to know if he could opt out and prevent his content from being used to train AI models.

Balaji stated that AI chatbots like ChatGPT are stripping away the commercial value of people’s work and services. In an interview, Balaji said, “This is not a sustainable model for the internet ecosystem.” OpenAI countered that its data collection methods are protected by fair use copyright principles and supported by legal precedents.

The debate highlights the need to discuss the value of training data and its impact on the internet ecosystem. Tech companies often avoid discussing this topic, instead focusing on their algorithms. As Balaji’s case shows, the consequences of not addressing these concerns can be severe.

Source: https://www.businessinsider.com/suchir-balaji-marques-brownlee-openai-ai-training-data-death-llm-2024-12