Researchers at Apple have released a paper questioning the “reasoning” capabilities of large language models, including those from OpenAI and Google. The team suggests that these models are overstating their abilities, with some even claiming they can “think” like humans.
The study found that despite improved performance on benchmarks, the fundamental capabilities of these models remain insufficiently understood. Apple’s researchers argue that existing approaches to benchmarking often suffer from data contamination and do not provide insights into the reasoning traces’ structure and quality.
Through experimentation, the team estimated the AI models’ ability to “think” using “controllable puzzle environments.” They discovered that frontier large reasoning models face a complete accuracy collapse beyond certain complexities, due to an “overthinking” phenomenon.
This finding is reminiscent of a broader trend in benchmarking results, which show that these models are more prone to hallucinating. The researchers claim their findings raise crucial questions about the true reasoning capabilities of current AI models.
The study’s conclusion suggests that the AI industry may have reached a plateau, with billions of dollars being poured into developing increasingly power-hungry AI models without significant breakthroughs.
Source: https://futurism.com/apple-damning-paper-ai-reasoning