Building AI models that understand chemical principles

Among all of the possible chemical compounds, it’s estimated that between 1020 and 1060 may hold potential as small-molecule drugs.

Evaluating each of those compounds experimentally would be far too time-consuming for chemists. So, in recent years, researchers have begun using artificial intelligence to help identify compounds that could make good drug candidates. 

One of those researchers is MIT Associate Professor Connor Coley PhD ’19, the Class of 1957 Career Development Associate Professor with shared appointments in the departments of Chemical Engineering and Electrical Engineering and Computer Science and the MIT Schwarzman College of Computing. His research straddles the line between chemical engineering and computer science, as he develops and deploys computational models to analyze vast numbers of possible chemical compounds, design new compounds, and predict reaction pathways that could generate those compounds. 

“It’s a very general approach that could be applied to any application of organic molecules, but the primary application that we think about is small-molecule drug discovery,” he says.

The intersection of AI and science

Coley’s interest in science runs in the family. In fact, he says, his family includes more scientists than non-scientists, including his father, a radiologist; his mother, who earned a degree in molecular biophysics and biochemistry before going to the MIT Sloan School of Management; and his grandmother, a math professor.

As a high school student in Dublin, Ohio, Coley participated in Science Olympiad competitions and graduated from high school at the age of 16. He then headed to Caltech, where he chose chemical engineering as a major because it offered a way to combine his interests in science and math.

During his undergraduate years, he also pursued an interest in computer science, working in a structural biology lab using the Fortran programming language to help solve the crystal structure of proteins. After graduating from Caltech, he decided to keep going in chemical engineering and came to MIT in 2014 to start a PhD.

Advised by professors Klavs Jensen and William Green, Coley worked on ways to optimize automated chemical reactions. His work focused on combining machine learning and cheminformatics — the application of computation methods to analyze chemical data — to plan reaction pathways that could make new drug molecules. He also worked on designing hardware that could be used to perform those reactions automatically. 

Part of that work was done through a DARPA-funded program called Make-It, which was focused on using machine learning and data science to improve the synthesis of medicines and other useful compounds from simple building blocks.

“That was my real entry point into thinking about cheminformatics, thinking about machine learning, and thinking about how we can use models to understand how different chemicals can be made and what reactions are possible,” Coley says.

Coley began applying for faculty jobs while still a graduate student, and accepted an offer from MIT at age 25. He received a mix of advice for and against taking a job at the same school where he went to graduate school, and eventually decided that a position at MIT was too enticing to turn down.

“MIT is a very special place in terms of the resources and the fluidity across departments. MIT seemed to be doing a really good job supporting the intersection of AI and science, and it was a vibrant ecosystem to stay in,” he says. “The caliber of students, the enthusiasm of the students, and just the incredible strength of collaborations definitely outweighed any potential concerns of staying in the same place.”

Chemistry intuition

Coley deferred the faculty position for one year to do a postdoc at the Broad Institute, where he sought more experience in chemical biology and drug discovery. There, he worked on ways to identify small molecules, from billions of candidates in DNA-encoded libraries, that might have binding interactions with mutated proteins associated with diseases.

After returning to MIT in 2020, he built his lab group with the mission of deploying AI not only to synthesize existing compounds with therapeutic potential, but also to design new molecules with desirable properties and new ways to make them. Over the past few years, his lab has developed a variety of computational approaches to tackle those goals. 

“We try to think about how to best pair a challenge in chemistry with a potential computational solution. And often that pairing motivates the development of new methods,” Coley says. One model his lab has developed, known as ShEPhERD, was trained to evaluate potential new drug molecules based on how they will interact with target proteins, based on the drug molecules’ three-dimensional shapes. This model is now being used by pharmaceutical companies to help them discover new drugs.

“We’re trying to give more of a medicinal chemistry intuition to the generative model, so the model is aware of the right criteria and considerations,” Coley says.

In another project, Coley’s lab developed a generative AI model called FlowER, which can be used to predict the reaction products that will result from combining different chemical inputs. 

In designing that model, the researchers built in an understanding of fundamental physical principles, such as the law of conservation of mass. They also compelled the model to consider the feasibility of the intermediate steps that need to take place on the pathway from reactants to products. These constraints, the researchers found, improved the accuracy of the model’s predictions.

“Thinking about those intermediate steps, the mechanisms involved, and how the reaction evolves is something that chemists do very naturally. It’s how chemistry is taught, but it’s not something that models inherently think about,” Coley says. “We’ve spent a lot of time thinking about how to make sure that our machine-learning models are grounded in an understanding of reaction mechanisms, in the same way an expert chemist would be.”

Students in his lab also work on many different areas related to the optimization of chemical reactions, including computer-aided structure elucidation, laboratory automation, and optimal experimental design.

“Through these many different research threads, we hope to advance the frontier of AI in chemistry,” Coley says.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top