LLM-Assisted Multimodal Sensing on Mobile Devices
Project Description
Mobile devices, such as smartwatches and smartphones, are equipped with a variety of multimodal sensors, including microphones, cameras, accelerometers, GPS, and more. By integrating data from these sensors, such devices can perform complex tasks like event detection, speech recognition, and smart health monitoring. However, most previous approaches to multimodal deep learning have been predominantly data-driven, often neglecting the importance of context. In contrast, language models (LLMs) excel at interpreting and leveraging contextual information due to their extensive world knowledge. By incorporating LLMs, we can develop advanced multimodal systems that not only analyze time-series data from sensors like IMUs but also integrate contextual information, such as GPS data, network activity, and app usage. This approach enables more sophisticated and adaptive multimodal sensing on mobile devices. The primary goals of this project are to (1) Effectively integrate sensor data with contextual information expressed in natural language. (2) Enhance the efficiency of multimodal inference on resource-constrained mobile devices.
Supervisor
OUYANG, Xiaomin
Quota
2
Course type
UROP1000
UROP1100
UROP2100
UROP3100
UROP4100
Applicant's Roles
- Develop apps or APIs to collect and process sensor data from mobile devices.
- Implement preliminary code for machine learning model training and inference.
- Conduct a survey on various application scenarios.
- Assist in data collection and other related experiments.
Applicant's Learning Objectives
- Foundation in research methodologies, particularly in the area of machine learning for mobile and IoT systems.
- Develop proficiency with the PyTorch deep learning framework and popular transformers toolkits.
- Gain expertise in mainstream open-source LLM architectures and multimodal systems.
Complexity of the project
Moderate