Real-Robot Reinforcement Learning with World Models and Vision-Language-Action Models
Project Description
Training reinforcement learning (RL) policies directly on physical robots is often slow, costly, and risky. World models can learn environment dynamics from collected data, allowing safe policy training in generated simulations.
This project will develop a vision-language-action model that learns within a world-model-generated environment using modern RL algorithms such as GRPO. The trained policy will then be deployed on a physical robot for direct evaluation.
Supervisor
GUO, Song
Quota
4
Course type
UROP1000
UROP1100
UROP2100
Applicant's Roles
Build a world model, train a vision-language-action model with GRPO, and deploy them on a real robot for evaluation. Required Knowledge: Python, PyTorch, Large Language Models
Applicant's Learning Objectives
 Learn to train world models.
 Learn to perform reinforcement learning on vision-language-action models within a world model.
 Learn to deploy trained policies on a physical robot.
Complexity of the project
Moderate