Advancing On Device Deployment of Vision–Language–Action / World Action Models for Embodied AI Systems
Project Description
This project aims to develop a real‑time embodied AI system on mobile manipulation platforms (e.g., dual‑arm robots or mobile bases equipped with manipulators) to support tasks such as object search, pick‑and‑place, and human–robot interaction in dynamic indoor environments. The key focus is on optimizing the end‑to‑end perception–reasoning–action pipeline of Vision–Language–Action (VLA) and World‑Action Models (WAMs) for efficient on‑device deployment. We will investigate techniques including pipelined sensing, cross‑modal early exit, and adaptive model configuration to achieve low‑latency and resource‑efficient execution on edge platforms (e.g., NVIDIA Jetson). The developed system will be validated on real robotic hardware in dynamic settings, enabling continuous robot navigation, manipulation, and interactive behaviors under real‑world constraints.
Supervisor
OUYANG, Xiaomin
Quota
2
Course type
UROP1100
UROP2100
UROP3100
UROP3200
UROP4100
Applicant's Roles
1. Develop lightweight frameworks for deploying LLMs on mobile devices.
2. Implement and benchmark baseline acceleration methods to evaluate latency, throughput, and energy efficiency for LLM inference on mobile platforms.
3. Design and prototype intelligent mobile GUI agents that autonomously operate device interfaces, leveraging LLM capabilities for efficient task automation.
4. Evaluate and optimize trade-offs among accuracy, latency, and resource consumption in mobile applications.
2. Implement and benchmark baseline acceleration methods to evaluate latency, throughput, and energy efficiency for LLM inference on mobile platforms.
3. Design and prototype intelligent mobile GUI agents that autonomously operate device interfaces, leveraging LLM capabilities for efficient task automation.
4. Evaluate and optimize trade-offs among accuracy, latency, and resource consumption in mobile applications.
Applicant's Learning Objectives
1. Gain a solid foundation in efficient inference techniques for both large language models and mobile GUI agents.
2. Develop hands-on skills with model compression and acceleration techniques, specifically for mobile deployment.
3. Learn to balance trade-offs among accuracy, latency, and resource consumption in resource-constrained environments.
4. Gain experience in prototyping intelligent mobile applications and integrating multimodal systems for enhanced real-time interaction.
2. Develop hands-on skills with model compression and acceleration techniques, specifically for mobile deployment.
3. Learn to balance trade-offs among accuracy, latency, and resource consumption in resource-constrained environments.
4. Gain experience in prototyping intelligent mobile applications and integrating multimodal systems for enhanced real-time interaction.
Complexity of the project
Moderate