Efficient Vision-Language-Action (VLA) Model for Robot Learning | Undergraduate Research Opportunities Program

Project Description

Embodied AI represents a pivotal frontier in artificial intelligence, combining egocentric computer vision, machine learning, and robotics to enable agents to learn, perceive, and act in dynamic environments. This project focuses on reproducing the OpenVLA (Open Vision-Language-Action) framework on the Open X-Embodiment (OpenX) dataset to advance research in multimodal learning for robot actions. By leveraging the publicly available codebase and dataset, we aim to streamline initial implementation efforts while exploring novel mechanisms to enhance robot action learning through richer integration of multimodal information. The use of a smaller language model with LoRA (Low-Rank Adaptation) ensures computational efficiency, making this an accessible and impactful project for undergraduate researchers.

Supervisor

XU Dan

Quota

3

Course type

UROP1000

UROP1100

UROP2100

UROP3100

UROP3200

UROP4100

Applicant's Roles

Reproduction of OpenVLA Framework:
Implement the OpenVLA pipeline using the Open X-Embodiment Dataset to validate and understand its approach to vision-language-action tasks.
Ensure reproducibility and performance benchmarks align with those reported in the original OpenVLA study.

Multimodal Integration for Robot Action Learning:
Design and experiment with novel mechanisms to incorporate additional modalities (e.g., haptics, audio, and environmental metadata) into the learning framework.
Investigate how multimodal fusion improves task performance and robustness.

Optimization and Efficiency:
Utilize smaller language models enhanced with LoRA to minimize computational cost without sacrificing performance.
Evaluate trade-offs between model size, efficiency, and task effectiveness.

Applicant's Learning Objectives

Learn and implement the OpenVLA framework;

Learn multimodal integration for robot action learning;

Learn multimodal optimization and efficiency;

Complexity of the project

Moderate