OpenAI has launched its new AI models, o3 and o4-mini, which are considered state-of-the-art in many respects. However, these models still suffer from hallucination problems, making up information that is not based on facts. This issue persists despite improvements seen in previous OpenAI models.
In internal tests, o3 and o4-mini were found to hallucinate more often than older models, including o1, o3-mini, and GPT-4o. The company’s technical report reveals that the new models “make more claims overall,” leading to both accurate and inaccurate answers.
A recent third-party test by Transluce found that o3 tends to make up actions it took in arriving at answers, such as claiming to have run code on a MacBook Pro. This issue is believed to be caused by the reinforcement learning used for these models, which may amplify existing problems with standard post-training pipelines.
Experts warn that hallucinations can be problematic in certain applications, making them less useful than they could be. One promising approach to improve accuracy is giving models web search capabilities, as seen in OpenAI’s GPT-4o model, which achieves 90% accuracy on a benchmark task. However, scaling up reasoning models appears to worsen hallucinations, making the hunt for a solution more urgent.
Source: https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-models-hallucinate-more