Evaluating and Improving LLMs' Code Generation Capabilities
Project Description
The use of LLMs for code generation has gained significant attention in recent years, as these models have demonstrated remarkable capabilities in synthesizing code snippets from natural language descriptions. Several studies have explored the potential of LLMs in various code generation tasks, ranging from simple function generation to more complex programming challenges. Despite these advancements, code generation for specific domains poses unique challenges that are not fully addressed by general-purpose LLMs.

This project aims to benchmark and improve LLMs' performance on domain-specific tasks by leveraging advanced techniques, such as reinforcement learning and agent-based approaches, to address the complexity and specificity of syntax and semantics in these domains. The objective is to equip models not only with an understanding of programming languages but also with domain-specific knowledge. Potential research areas include competitive code generation, verification code generation, multi-language code generation, and safer code generation.
Supervisor
SHEN, Jiasi
Quota
2
Course type
UROP1000
UROP1100
UROP2100
UROP3100
UROP3200
UROP4100
Applicant's Roles
*Please email Dr. Shen for approval before submitting the official UROP application.*

Students applying to this project should have prior experience in Machine Learning or Reinforcement Learning (e.g., through coursework or projects) and proficiency in Python programming. Completion of a Machine Learning course and experience in competitive programming are preferred. Familiarity with Python-based ML frameworks (e.g., PyTorch, TensorFlow) is a plus but not required.

The students enrolled in this project will assist in constructing benchmarking and training datasets, evaluating LLM performance, and designing and implementing RL or agent-based algorithms.
Applicant's Learning Objectives
Applicants will acquire practical experience in both machine learning and domain-specific techniques, developing proficiency in constructing end-to-end ML pipelines for domain-specific challenges. They will also gain a deeper understanding of LLMs for code generation and learn methodologies for conducting experimental systems research. This experience is particularly valuable for students pursuing graduate studies or careers in AI, programming languages (PL), AI for PL.
Complexity of the project
Moderate