Research
Designing protein function without massive screening.
Proteins are the molecular machines behind almost every process in biology. But unlike a car engine, which can be designed precisely, designing a protein today still means screening thousands of candidates to find a few hits. Biophysical models explain how proteins work, but struggle to create new function. Large machine-learning models can now generate functional proteins and guide engineering, but they lack physical grounding and their predictions often fail when tested. We want to break the barrier between these two fields, so protein function can be programmed with minimal trial and error.
-
Learning the physics behind ML-based protein designs
Machine-learning models are trained on data — protein sequences, structures, functional assays — not on biophysics. They can generate functional proteins and identify beneficial mutations, but their predictions are noisy and often only a small fraction succeed. We aim to pin down the physical basis that determines how active a designed protein is. This would enable us to design new function without leaning on black-box models or endless screening.
-
Pushing protein ML from structure to function
Protein sequence and structure data are abundant, and large foundation models perform well at converting between them — predicting structure from sequence, or designing sequence for a particular fold, with no physical modeling required. In contrast, function datasets are scarcer and more heterogeneous. Models trained on them tend to overfit and generalize poorly beyond the training set. We aim to develop new ML models grounded in molecular interactions, energetics, and protein dynamics, so physical and data-driven approaches reinforce each other to enable protein function design beyond sequence-structure mapping.
-
Designing enzymes for sustainable chemistry
Enzymes can accelerate reactions by many orders of magnitude. But designing an artificial enzyme, or re-engineering one for new chemistry, remains extremely difficult. At the heart of the problem is the design of a pre-organized enzyme active site. This active site interacts with multiple reaction states to lower the activation free energy while keeping the reaction turning over. We aim to design and engineer functional enzymes to address pressing societal challenges, such as CO₂ capture and reduction, plastic degradation, and pharmaceutical synthesis.
-
Antibody design and engineering
Antibodies are the dominant class of protein therapeutics. By varying just six short complementarity-determining regions (CDRs), they can achieve binding specificity for almost any molecular target. But these highly flexible CDR loops are much more challenging to design than rigid protein binders with secondary structure-based binding interfaces. We aim to develop computational strategies to more accurately model how the CDR loops engage the antigen, and how distant framework mutations modulate antibody function. We will then use these models to design broadly protective antibodies against emerging viral and bacterial pathogens.
-
High-throughput measurements of protein interactions
An intrinsic limitation of current supervised protein ML models is that they are trained directly on function data that are assay-dependent and protein-specific. They often need large datasets just to predict for a single protein that has already been heavily measured. Protein function is governed by molecular interactions, so learning these interactions directly would help a model capture physical constraints and generalize better, even with limited training data. But direct data on molecular interactions is extremely rare, and current measurement methods are slow and low-throughput. We aim to develop high-throughput platforms that characterize the functionally relevant molecular interactions, providing a new data modality for physics-aware ML models to predict and design protein function.