Accelerating inference of Large Language Models on Resource-Constrained Devices
Project Description
Large Language Models (LLMs) have demonstrated impressive performance in natural language understanding and generation. However, their high computational and memory requirements pose significant challenges for deployment on mobile and edge devices. This project investigates methods for accelerating inference in LLMs for important applications like mobile GUI agents under resource-constrained environments, such as smartphones and IoT devices. The focus is on enabling real-time, low-latency, and energy-efficient applications, including intelligent GUI agents that can autonomously navigate and operate device interfaces. By combining model-level optimizations with system-level acceleration, the project aims to deliver scalable and adaptive inference pipelines for LLM-powered mobile applications on edge platforms.
Supervisor
OUYANG, Xiaomin
Quota
2
Course type
UROP1000
UROP1100
UROP2100
UROP3100
UROP3200
UROP4100
Applicant's Roles
1. Develop lightweight frameworks for deploying LLMs on mobile devices.
2. Implement and benchmark baseline acceleration methods to evaluate latency, throughput, and energy efficiency for LLM inference on mobile platforms.
3. Design and prototype intelligent mobile GUI agents that autonomously operate device interfaces, leveraging LLM capabilities for efficient task automation.
4. Evaluate and optimize trade-offs among accuracy, latency, and resource consumption in mobile applications.
2. Implement and benchmark baseline acceleration methods to evaluate latency, throughput, and energy efficiency for LLM inference on mobile platforms.
3. Design and prototype intelligent mobile GUI agents that autonomously operate device interfaces, leveraging LLM capabilities for efficient task automation.
4. Evaluate and optimize trade-offs among accuracy, latency, and resource consumption in mobile applications.
Applicant's Learning Objectives
1. Gain a solid foundation in efficient inference techniques for both large language models and mobile GUI agents.
2. Develop hands-on skills with model compression and acceleration techniques, specifically for mobile deployment.
3. Learn to balance trade-offs among accuracy, latency, and resource consumption in resource-constrained environments.
4. Gain experience in prototyping intelligent mobile applications and integrating multimodal systems for enhanced real-time interaction.
2. Develop hands-on skills with model compression and acceleration techniques, specifically for mobile deployment.
3. Learn to balance trade-offs among accuracy, latency, and resource consumption in resource-constrained environments.
4. Gain experience in prototyping intelligent mobile applications and integrating multimodal systems for enhanced real-time interaction.
Complexity of the project
Moderate