Artificial Intelligence (AI) was once hailed as the “next transformational technology to rule them all,” but its current state is far from triumphant. Instead, AI is struggling to live up to its promise due to a lack of high-quality human-generated data.
To feed these data-hungry models, researchers and organizations have resorted to synthetic data, which has led to a degradation of AI models. This phenomenon, known as “model collapse,” occurs when AI systems are trained recursively on content they generated, leading to a decline in output quality.
Model collapse causes issues such as loss of nuance, reduced diversity, amplification of biases, and generation of nonsensical outputs. A study published in Nature highlighted the rapid degeneration of language models trained recursively on AI-generated text, with models producing entirely irrelevant and nonsensical content after just nine iterations.
To mitigate this issue, enterprises can take several steps to ensure their AI systems are accurate and trustworthy. These include investing in data provenance tools, deploying AI-powered filters to detect synthetic content, partnering with trusted data providers, promoting digital literacy and awareness, and prioritizing real, human-sourced data over shortcuts.
By taking these actions, organizations can set AI on a safer, smarter path and build a future where AI is both powerful and beneficial to society.
Source: https://venturebeat.com/ai/synthetic-data-has-its-limits-why-human-sourced-data-can-help-prevent-ai-model-collapse