Nvidia’s Eureka Algorithms Train Robots to Learn Complex Tasks

Nvidia’s new AI agent called Eureka can train bots for movement on its own without any task-specific prompt or template.

A new AI agent by Nvidia Research called Eureka can teach robots complex skills such as performing a rapid pen-spinning trick as well as a human. Eureka can write reward algorithms to teach robots autonomously. In the official announcement post, the company showed a video of 30 such tasks that Eureka allowed robots to learn.

As per the announcement, “Eureka has also taught robots to open drawers and cabinets, toss and catch balls, and manipulate scissors, among other tasks.”

Eureka is hosted on the NVIDIA Isaac Gym, an NVIDIA Omniverse-based development platform to build 3D tools and applications. The AI agent is powered by the GPT-4 LLM by OpenAI.

The accompanying research paper has been uploaded on GitHub, which goes on to explain how Eureka accomplished the task of teaching complex skills to bots through reinforced learning without any task-specific prompting or template. Eureka generates these reward functions that can often outperform human-written rewards.

This is a remarkable step toward reinforced learning for robots, and can potentially change the way the bots of the future are trained.

Eureka utilizes GPU-accelerated simulation to “quickly evaluate the quality of a large batch of reward candidates,” in what the company dubs Rapid Reward Evaluation via Massively Parallel RL.

Even without any task-specific prompt, taking cues purely from the environment, Eureka can be a generalist reward designer that can produce various functions or algorithms. It also iterates between evolutionary reward search, evaluation, and reflection to improve the output over time.

Across 29 tasks, Eureka rewards outperform expert human-written ones on 83% of them with an average normalized improvement of 52%. In particular, Eureka realizes much greater gains on high-dimensional dexterity environments.
Nvidia, UPenn, Caltech, UT Austin, Equal Advising, Eureka: Human-Level Reward Design via Coding Large Language Models

By Abhimanyu