Incorporating process chemistry consideration into synthesis planning
Project Description
Recent years have seen tremendous progress in computer aided synthesis planning (CASP). Computational tools, both rule-based and machine learning based, have been developed to search for viable synthetic pathways for molecular targets. For molecules with low to moderate complexity, thousands of potential reaction pathways can be generated in a short amount of time. These developments have been viewed as promising tools to accelerate chemical synthesis design, e.g., in the pharmaceutical industry. However, most existing efforts focused on improving the success rate of finding possible pathways, but how to rank and prioritize pathways for practical development remains a challenging issue.

This project will focus on incorporating some process chemistry considerations into synthesis planning. Process chemistry is a broad topic that covers many different practical but nontrivial aspects, including productivity, selectivity, safety, operability, environmental impact, etc. Adding these considerations can reduce the effort needed to manually examine computer planned synthetic pathways, by effectively prioritizing better candidate pathways. It is desirable to develop quantitative metrics for evaluating each individual aspect and finally construct a comprehensive metric.

As an initial focus, we will be developing models for yield estimation. It is the key to answering many process related questions, e.g. mass and energy balance, waste production, etc. quantitative yield prediction for any chemical reaction in general has proved challenging. We will explore the possibility of estimating yield (with uncertainty) for individual types of chemical reactions. These models will be used in an optimization framework for prioritizing pathways found by CASP software that are practically most favorable.

GAO Hanyu
Course type
Applicant's Roles
Your role in the project will be developing different strategies for reaction-type-specific yield estimation, from descriptive statistics, to supervised learning and transfer learning methods. You would need to process and analyze data from a large chemical reaction database for developing the above models. You will have a chance to get familiarized with the CASP tool and optimization tool for synthetic pathway generation and prioritization.
Applicant's Learning Objectives
By working on the project, you are expected to gain the following skills/experience:
1. python programming for cheminformatics applications;
2. big data analysis and visualization;
3. developing and using machine learning models.
Complexity of the project