RT-2 learns from web and robotics data to translate knowledge into generalized instructions for robotic control, paving the way for robots that can “think” and “learn” about tasks or concepts without having been explicitly trained.
So far, robots have to be trained to perform specific tasks. This limits what they can do. In a breakthrough, Google has teased the second version of the Robotics Transformer (RT) called RT-2 which is remarkably better than their RT-1 robot.
Breaking away from the conventional robots, RT-2 is trained on a model much like generative AI. You don’t need to train a general deployment of generative AI like GPT-4 for a specific problem. As it’s trained on a lot of data from the internet and has a connection to the internet, it can provide answers to questions that have never been asked to it before.
In a similar way, the model underlying Google RT-2 can “learn” to do novel things. In the official announcement, Google says:
If you wanted previous systems to be able to throw away a piece of trash, you would have to explicitly train them to be able to identify trash, as well as pick it up and throw it away. Because RT-2 is able to transfer knowledge from a large corpus of web data, it already has an idea of what trash is and can identify it without explicit training. It even has an idea of how to throw away the trash, even though it’s never been trained to take that action.
Vincent Vanhoucke, Head of Robotics, Google DeepMind
RT-2 is fueled by a single, novel vision-language-action or VLA model to handle the complex reasoning, translating it into doable actions, and actually controlling the low-level hardware to perform the tasks. In “seen” tasks where the model has been previously trained, the model performed as well as RT-1. And in novel “unseen” scenarios, it performed twice as well.
This is the next step toward true AI-enabled robots that can think on their own, in a way. Though the immediate use case might be as a general robot to accomplish tasks around a house or office, this breakthrough paves the way for more advanced models and robots that can learn new concepts, transfer that new knowledge, and apply it to situations dependably.
The team tested RT-2 in over 6000 robotic trials.
The full coverage can be found on the DeepMind blog.