A new global benchmark, “Humanity’s Last Exam” (HLE), has been created to test the limits of today’s advanced artificial intelligence systems. The test consists of 2,500 rigorously reviewed questions in various disciplines, with a focus on precision and closed-ended answers. Despite high scores on conventional benchmarks, AI models struggled with HLE, passing fewer than 10% of the questions when first released in 2025. However, top models still show significant improvement, reaching just below 40%. This new benchmark aims to identify remaining limitations and emerging generalist research capabilities in AI systems.
Source: https://www.manchester.ac.uk/about/news/mathematicians-contribute-to-ai-benchmark