Language-Guided Dense Prediction for Scene Understanding
Project Description
In this project, we will explore how large-scale pre-trained cross-modal (language and vision) deep models, e.g. CLIP, can be used to generate dense pixel-wise labels for challenging scene understanding tasks, such as zero-shot detection and segmentation.
Supervisor
XU Dan
Quota
2
Course type
UROP1000
UROP1100
UROP2100
UROP3100
UROP4100
Applicant's Roles
Investigate related works and implement novel ideas to improve the task performance, and publish research results to international conferences
Applicant's Learning Objectives
Understand language-guided computer vision problems, and manage to develop and implement a deep learning framework for the problems
Complexity of the project
Moderate