Mechanistic Interpretability of Machine Learning models in the physical sciences
Project Description
Various neural networks and data-driven methods have demonstrated progress in modelling complex physical systems. However, these models are often “black boxes”, and this opacity limits both scientific insight and trust in AI-assisted discovery, particularly in fields where understanding the underlying principles is important.
Inspired by the recent mechanistic interpretability (MI) work for large language models, MI focuses on understanding the internal workings of neural networks. By analysing the representations and “circuits” (Olah et al 2022), we can potentially peek into how physical information (in the latent space) is learned and processed.
This project aims to be a first exposure to MI and how it might apply for deep learning in physical science problems. The work has flexibility to expand into more complex physical systems depending on student interest and progress.
Inspired by the recent mechanistic interpretability (MI) work for large language models, MI focuses on understanding the internal workings of neural networks. By analysing the representations and “circuits” (Olah et al 2022), we can potentially peek into how physical information (in the latent space) is learned and processed.
This project aims to be a first exposure to MI and how it might apply for deep learning in physical science problems. The work has flexibility to expand into more complex physical systems depending on student interest and progress.
Supervisor
MAK Julian
Quota
1
Course type
UROP1000
UROP1100
UROP2100
UROP3100
UROP3200
UROP4100
Applicant's Roles
* Experiment with neural networks and toy models in physics problems using Python
* Implement and apply MI techniques (sparse autoencoders, activation patching, feature visualisation, probing etc.)
* Implement and apply MI techniques (sparse autoencoders, activation patching, feature visualisation, probing etc.)
Applicant's Learning Objectives
* Understand the latest MI research literature and methods
* Gain practical experience with modern deep learning techniques applied to physical science problems
* Develop intuition for how neural networks process and represent scientific information
* Understand the challenges and opportunities in explainable AI for science
* Experience interdisciplinary research combining AI/ML and physical sciences
* Gain practical experience with modern deep learning techniques applied to physical science problems
* Develop intuition for how neural networks process and represent scientific information
* Understand the challenges and opportunities in explainable AI for science
* Experience interdisciplinary research combining AI/ML and physical sciences
Complexity of the project
Challenging