AI Benchmarking Gets Creative With Games and Minecraft

AI benchmarking has long been a way to test the capabilities of artificial intelligence systems, but many traditional methods have limitations. Some enthusiasts are turning to games like Pictionary and Minecraft as new ways to challenge AIs.

Paul Calcraft, a freelance AI developer, created an app that allows two AI models to play a Pictionary-like game with each other. The game tests the models’ ability to understand concepts like shapes and colors. Another 16-year-old, Adonis Singh, has developed a tool called mc-bench that gives a model control over a Minecraft character, testing its ability to design structures.

Using games as benchmarks is not new, but it’s becoming more popular with large language models (LLMs). These models are being hooked up to games to probe their logic and problem-solving skills. AI researcher Matthew Guzdial believes games provide a visual way to compare how a model performs and behaves, offering a different simplification of reality.

However, not everyone is convinced that games like Minecraft are a suitable benchmark for AI. Mike Cook, a research fellow at Queen Mary University, thinks that while Minecraft has some unique qualities, it’s not particularly special as an AI testbed. He argues that other video games, such as Fortnite or World of Warcraft, can provide similar challenges.

Despite the debate, watching LLMs build castles in Minecraft is still fascinating. As Calcraft notes, understanding spatial concepts and multimodality are critical elements for AI advancement, making these creative applications promising early steps on that journey.

Source: https://techcrunch.com/2024/11/05/people-are-using-games-like-pictionary-to-benchmark-ai-now