Real-Robot Reinforcement Learning with World Models and Vision-Language-Action Models

Project Description

Training reinforcement learning (RL) policies directly on physical robots is often slow, costly, and risky. World models can learn environment dynamics from collected data, allowing safe policy training in generated simulations.
This project will develop a vision-language-action model that learns within a world-model-generated environment using modern RL algorithms such as GRPO. The trained policy will then be deployed on a physical robot for direct evaluation.

Supervisor

GUO, Song

Quota

Course type

UROP1000

UROP1100

UROP2100

Applicant's Roles

Build a world model, train a vision-language-action model with GRPO, and deploy them on a real robot for evaluation. Required Knowledge: Python, PyTorch, Large Language Models

Applicant's Learning Objectives

 Learn to train world models.
 Learn to perform reinforcement learning on vision-language-action models within a world model.
 Learn to deploy trained policies on a physical robot.

Complexity of the project

Moderate

Apply Return home