LLM-Assisted Multimodal Sensing on Mobile Devices | Undergraduate Research Opportunities Program

Project Description

Mobile devices, such as smartwatches and smartphones, are equipped with a variety of multimodal sensors, including microphones, cameras, accelerometers, GPS, and more. By integrating data from these sensors, such devices can perform complex tasks like event detection, speech recognition, and smart health monitoring. However, most previous approaches to multimodal deep learning have been predominantly data-driven, often neglecting the importance of context. In contrast, language models (LLMs) excel at interpreting and leveraging contextual information due to their extensive world knowledge. By incorporating LLMs, we can develop advanced multimodal systems that not only analyze time-series data from sensors like IMUs but also integrate contextual information, such as GPS data, network activity, and app usage. This approach enables more sophisticated and adaptive multimodal sensing on mobile devices. The primary goals of this project are to (1) Effectively integrate sensor data with contextual information expressed in natural language. (2) Enhance the efficiency of multimodal inference on resource-constrained mobile devices.

Supervisor

OUYANG, Xiaomin

Quota

2

Course type

UROP1000

UROP1100

UROP2100

UROP3100

UROP4100

Applicant's Roles

- Develop apps or APIs to collect and process sensor data from mobile devices.
- Implement preliminary code for machine learning model training and inference.
- Conduct a survey on various application scenarios.
- Assist in data collection and other related experiments.

Applicant's Learning Objectives

- Foundation in research methodologies, particularly in the area of machine learning for mobile and IoT systems.
- Develop proficiency with the PyTorch deep learning framework and popular transformers toolkits.
- Gain expertise in mainstream open-source LLM architectures and multimodal systems.

Complexity of the project

Moderate