Open World Understanding based on Large Vision-Language Models
Project Description
This project aims to develop a system that can perform open-world understanding based on pretrained large vision-language foundation models. Open-world understanding aims to perform semantic understanding of the real world on a very large set of open object classes. We will investigate strategies to distill open-world knowledge from existing large vision-language foundation models and implement open-world semantic perception models. The system is expected to perform a semantic perception of more than 1000 classes from various real data distributions.
Supervisor
XU Dan
Quota
3
Course type
UROP1000
UROP1100
UROP2100
Applicant's Roles
Investigate the research direction of open-world understanding, and design new ideas and frameworks for the problem based on large language-vision foundation models.
Applicant's Learning Objectives
Learn to conduct a research problem and gather experience in computer vision
Complexity of the project
Moderate