Accelerating inference of Large Language Models on Resource-Constrained Devices | Undergraduate Research Opportunities Program

Project Description

Large Language Models (LLMs) have demonstrated impressive performance in natural language understanding and generation. However, their high computational and memory requirements pose significant challenges for deployment on mobile and edge devices. This project investigates methods for accelerating inference in LLMs for important applications like mobile GUI agents under resource-constrained environments, such as smartphones and IoT devices. The focus is on enabling real-time, low-latency, and energy-efficient applications, including intelligent GUI agents that can autonomously navigate and operate device interfaces. By combining model-level optimizations with system-level acceleration, the project aims to deliver scalable and adaptive inference pipelines for LLM-powered mobile applications on edge platforms.

Supervisor

OUYANG, Xiaomin

Quota

2

Course type

UROP1000

UROP1100

UROP2100

UROP3100

UROP3200

UROP4100

Applicant's Roles

1. Develop lightweight frameworks for deploying LLMs on mobile devices.
2. Implement and benchmark baseline acceleration methods to evaluate latency, throughput, and energy efficiency for LLM inference on mobile platforms.
3. Design and prototype intelligent mobile GUI agents that autonomously operate device interfaces, leveraging LLM capabilities for efficient task automation.
4. Evaluate and optimize trade-offs among accuracy, latency, and resource consumption in mobile applications.

Applicant's Learning Objectives

1. Gain a solid foundation in efficient inference techniques for both large language models and mobile GUI agents.
2. Develop hands-on skills with model compression and acceleration techniques, specifically for mobile deployment.
3. Learn to balance trade-offs among accuracy, latency, and resource consumption in resource-constrained environments.
4. Gain experience in prototyping intelligent mobile applications and integrating multimodal systems for enhanced real-time interaction.

Complexity of the project

Moderate