Google Unveils New Gemini Robotics Models for Robots

Google DeepMind has announced a new family of Gemini 2.0 robotics models designed to enable robots to perform complex tasks in various environments. The models, developed by head of robotics Carolina Parada and her team, focus on “embodied AI” that allows robots to understand natural language, reason about scenes, interact with users, and take physical action.

The new models, which build upon Gemini 2.0 through fine-tuning with robot-specific data, add physical action to multimodal outputs like text, video, and audio. The Gemini Robotics models are highly dexterous, interactive, and general, enabling robots to adapt to new objects, environments, and instructions without further training.

The most advanced model, Gemini Robotics-ER, excels in embodied reasoning capabilities, including detecting objects, pointing at object parts, finding corresponding points, and detecting objects in 3D. Powered by the models, machines have demonstrated capabilities such as preparing salads, packing kids’ lunches, playing games like Tic-Tac-Toe, and even folding an origami fox.

The goal of the Gemini Robotics project is to build embodied AI that can power robots to help with everyday tasks in the real world. The new models are expected to be highly useful in industries where precision and adaptability are crucial, as well as in human-centric spaces like homes. With these advancements, Google DeepMind is taking steps closer to a future where robots can seamlessly interact with humans and take on various roles.

Google CEO Sundar Pichai praised the milestone, saying it lays the foundation for the next generation of robotics that can be helpful across a range of applications. The models are now available to trusted testers and partners, paving the way for their widespread adoption in industries and homes alike.

Source: https://blog.google/products/gemini/how-we-built-gemini-robotics