Compiler Optimization Guided by Machine Learning | Undergraduate Research Opportunities Program

Project Description

Since the dawn of computer science, compilers, as a ubiquitous infrastructure, have played an important role in the world of software development.

This project explores the application of Machine Learning/Reinforcement Learning techniques to modern compiler tasks such as code optimization, performance prediction, and automatic tuning. The goal is to build intelligent compilation pipelines that can adapt and learn from data, improving traditional compiler heuristics with data-driven models. Potential topics include learning-based inlining, register allocation, loop unrolling, and using graph neural networks (GNNs) on control/data-flow graphs to guide optimizations.

We aim to integrate ML models with compiler infrastructures like LLVM, enabling research into how ML can make compilers smarter, faster, and more portable across hardware platforms.

Supervisor

SHEN, Jiasi

Quota

3

Course type

UROP1000

UROP1100

UROP2100

UROP3100

UROP3200

UROP4100

Applicant's Roles

*Please email Dr. Shen for approval before submitting the official UROP application.*

Students applying to this project should have experience in Machine Learning/Reinforcement Learning (e.g., through coursework or projects) and C/C++ programming. Having taken a course in Machine Learning or Compiler Design (such as Modern Compiler Design or equivalent) is preferred. Familiarity with Python-based ML frameworks (e.g., PyTorch, TensorFlow) or LLVM is a plus but not required.

The students enrolled in this project will assist in designing and implementing ML/RL models tailored for compiler optimization problems. Tasks may include feature extraction from IR/CFG, training and evaluating models, integrating models with compiler passes, and benchmarking results.

Applicant's Learning Objectives

Applicants will gain hands-on experience in both compiler infrastructures and machine learning techniques. They will develop skills in building end-to-end ML pipelines for systems problems, gain insights into compiler internals (especially LLVM), and learn how to conduct experimental systems research. This experience is highly beneficial for students considering graduate studies or careers in systems, PL, or AI for systems.

Complexity of the project

Moderate