Exploring LLMs/VLMs long-form understanding capability
Project Description
While today’s large language models can breeze through paragraphs and even multi-page articles, their grasp of truly long-form text—treatises that stretch across hundreds of pages, patient records that unfold over years, or sprawling literary works threaded with recurring motifs—remains murky. This project asks: what does “long-form language understanding” really mean, and how far do current LLMs actually stretch? We will dissect state-of-the-art models’ working memory, global coherence, and narrative acuity by pushing them through entire novels, full-length legislative bills, and unsegmented chat logs that span months. Our aim is not to win another leaderboard but to map the conceptual terrain—spotting where today’s architectures glide, where they stumble, and which inductive biases, memory tricks, or prompting rituals might push them further. The investigation is exploratory and research-heavy; it calls for a fascination with textual cognition, strong engineering chops, and the patience to untangle messy, open-ended outcomes.
Supervisor
SONG Yangqiu
Quota
5
Course type
UROP1000
UROP1100
UROP2100
UROP3100
UROP3200
UROP4100
Applicant's Roles
Working together with a PhD student on task formulation, designing experiments, analyzing results, and writing research papers.
Applicant's Learning Objectives
Have hands-on experience in playing with VLMs. Learn how to research with them for diverse reasoning scenarios.
Complexity of the project
Challenging