New AI Benchmark Sets Bar for Expert-Level Intelligence
A new global benchmark, “Humanity’s Last Exam” (HLE), has been created to test the limits of today’s advanced artificial intelligence systems. The test consists of 2,500 rigorously reviewed questions in various disciplines, with a focus on precision and closed-ended answers. Despite high scores on conventional benchmarks, AI models struggled with HLE, passing fewer than 10% … Read more