AI Reasoning Models May Not Be “Thinking” as Well as Thought

A recent study from Apple’s researchers has cast doubt on the idea that artificial intelligence (AI) reasoning models can truly think like humans. The study found that large reasoning models, such as OpenAI o1 and o3, DeepSeek R1, Claude 3.7 Sonnet Thinking, and Google Gemini Flash Thinking, collapse when faced with increasingly complex problems.

The researchers tested the reasoning skills of these models using classic logic puzzles, including the Tower of Hanoi, jumping checker pieces into empty spaces, and stacking blocks in a specific configuration. The results showed that accuracy progressively declines as problem complexity increases until reaching complete collapse beyond a model-specific complexity threshold.

While AI models excel at math and coding, they struggle with more complex problems. In fact, when given answers, their accuracy doesn’t improve. However, it’s essential to note that this research doesn’t mean AI reasoning models don’t reason at all; rather, it highlights their limitations compared to human abilities.

The study’s findings are significant, as they suggest that large language models have weaknesses and may not be a substitute for well-specified conventional algorithms. The researchers emphasize the need to consider these limitations when evaluating AI advancements.

This research serves as an important reminder of the importance of critically examining AI claims and understanding their capabilities and limitations.

Source: https://mashable.com/article/apple-research-ai-reasoning-models-collapse-logic-puzzles