Mechanistic Interpretability of Machine Learning models in the physical sciences
      
  Project Description
              Various neural networks and data-driven methods have demonstrated progress in modelling complex physical systems. However, these models are often “black boxes”, and this opacity limits both scientific insight and trust in AI-assisted discovery, particularly in fields where understanding the underlying principles is important.
Inspired by the recent mechanistic interpretability (MI) work for large language models, MI focuses on understanding the internal workings of neural networks. By analysing the representations and “circuits” (Olah et al 2022), we can potentially peek into how physical information (in the latent space) is learned and processed.
This project aims to be a first exposure to MI and how it might apply for deep learning in physical science problems. The work has flexibility to expand into more complex physical systems depending on student interest and progress.
          Inspired by the recent mechanistic interpretability (MI) work for large language models, MI focuses on understanding the internal workings of neural networks. By analysing the representations and “circuits” (Olah et al 2022), we can potentially peek into how physical information (in the latent space) is learned and processed.
This project aims to be a first exposure to MI and how it might apply for deep learning in physical science problems. The work has flexibility to expand into more complex physical systems depending on student interest and progress.
Supervisor
              MAK Julian
          Quota
              1
          Course type
          UROP1000
          UROP1100
          UROP2100
          UROP3100
          UROP3200
          UROP4100
              Applicant's Roles
              * Experiment with neural networks and toy models in physics problems using Python
* Implement and apply MI techniques (sparse autoencoders, activation patching, feature visualisation, probing etc.)
          * Implement and apply MI techniques (sparse autoencoders, activation patching, feature visualisation, probing etc.)
Applicant's Learning Objectives
              * Understand the latest MI research literature and methods
* Gain practical experience with modern deep learning techniques applied to physical science problems
* Develop intuition for how neural networks process and represent scientific information
* Understand the challenges and opportunities in explainable AI for science
* Experience interdisciplinary research combining AI/ML and physical sciences
          * Gain practical experience with modern deep learning techniques applied to physical science problems
* Develop intuition for how neural networks process and represent scientific information
* Understand the challenges and opportunities in explainable AI for science
* Experience interdisciplinary research combining AI/ML and physical sciences
Complexity of the project
              Challenging