Google has introduced two new AI models, Gemini Robotics and Gemini Robotics-ER, designed to bring advanced artificial intelligence (AI) capabilities to humanoid robots. These models aim to enable robots to perform a wide range of real-world tasks, making them more helpful and interactive.
Gemini Robotics is an advanced vision-language-action model that leverages the Gemini 2.0 framework to control physical actions directly. This model can be used for various applications, including home and industrial settings. The new AI also provides generalization capabilities, allowing robots to adapt to different situations and solve tasks they’ve never seen before.
Gemini Robotics-ER is a Gemini model with advanced spatial understanding, enabling roboticists to run their own programs using embodied reasoning (ER) abilities. This model improves upon the existing Gemini 2.0 framework by providing enhanced spatial reasoning and can instantiate new capabilities on the fly.
The introduction of these models aligns with Google’s goal of responsibly advancing AI and robotics. The company has taken a layered approach to addressing safety in its research, from low-level motor control to high-level semantic understanding. To advance robotics safety research, Google is releasing a new dataset to evaluate and improve semantic safety in embodied AI and robotics.
The Gemini Robotics models are designed for collaboration with human users and can understand and respond to commands phrased in everyday language. These advancements aim to make robots more helpful, interactive, and safe for humans.
With the introduction of these models, Google is pushing the boundaries of AI capabilities for humanoid robots. As the company continues to explore the potential of embodied AI, it’s essential to consider the societal implications and develop AI applications responsibly.
Source: https://deepmind.google/discover/blog/gemini-robotics-brings-ai-into-the-physical-world