De Novo Enzyme Design with Diffusion Models: A Practical Guide

For three decades, de novo enzyme design has been one of the harder problems in computational biology. Designing a protein that binds a small molecule is difficult. Designing one that binds and then chemically transforms that molecule, with the right geometry and electronics at the active site, has been the kind of project that takes a full lab years of iterative design and directed evolution, and often ends without a functional enzyme. That balance has started to shift. A new class of diffusion-based models, trained on large structural datasets and conditioned on ligand chemistry, is beginning to deliver designed enzymes that work on the first pass, or close to it.

This post is a practical guide to what de novo enzyme design looks like in 2026. It covers why the problem has been hard, how diffusion models like DISCO approach it differently, what the published wet-lab evidence actually shows, and how to move from a substrate SMILES to a testable candidate without stitching together five disconnected tools.

What de novo enzyme design means, and why it has been hard

De novo enzyme design means building a protein from scratch for a chosen chemical reaction, without starting from a natural enzyme template. The designer has a substrate, sometimes a proposed transition state, and a target reaction. The goal is a protein sequence that folds into a stable structure, binds the substrate in a productive orientation, and stabilises the transition state enough to accelerate the reaction.

Every step in that sequence is hard for a different reason.

Folding is hard because there is no direct way to tell from a sequence whether it will collapse to a single stable structure or aggregate. Binding is hard because it depends on exact side-chain positioning in a pocket that itself depends on the fold. Catalysis is hard because it requires positioning catalytic residues with angstrom-scale precision around a transition state that is usually not even an experimentally observed species.

The classical workflow, developed in the Baker lab and others starting in the mid-2000s, took this problem head-on. You would start from a quantum-chemistry calculation of the transition state, then hand-build a “theozyme” of idealised amino acid side chains arranged to stabilise it. You would scaffold that theozyme into a Rosetta-designed protein using RosettaMatch and RosettaDesign. You would express, purify, and assay the result. Most designs would be inactive, but a few would show measurable activity. Then you would run directed evolution for tens of rounds to improve the catalysis by orders of magnitude.

The Baker group published designed Kemp eliminases in Nature in 2008, retro-aldolases in Science in 2008, and Diels-Alderases in Science in 2010 using this approach. These were landmark papers. They also showed the problem. Designed enzymes started with turnover numbers orders of magnitude below natural enzymes. Most of the useful activity came from directed evolution after the initial design, not from the design itself.

Subsequent generations of tools improved on specific steps. RFdiffusion, published in Nature in 2023, brought diffusion models to protein backbone generation. ProteinMPNN, from the same group, improved sequence design for a given backbone. LigandMPNN extended MPNN to handle ligand-binding sites. These tools pushed success rates higher, but the core architecture was still modular. Backbone first, then sequence, then pocket, then hope.

How diffusion models change the picture

Diffusion models for protein design started with backbone generation and have steadily expanded in scope. The key architectural shift for enzyme design is joint generation of sequence and structure, conditioned on the ligand from the start.

DISCO, introduced by the Arnold lab and collaborators, is an example of this generation of models. It treats sequence and structure as a single joint distribution. The ligand is not added to a pre-designed pocket. It is present throughout the diffusion process, co-folded with the protein, so the model learns to generate sequences whose folded structures accommodate the ligand in a catalytically productive geometry. Catalytic residues are not pre-specified. They emerge from the co-design process, discovered by the model rather than placed by the user.

This has practical consequences that are worth being concrete about.

First, there is no theozyme step. Users do not need to compute a transition state geometry in advance. The model conditions directly on the substrate SMILES. For well-represented chemistries in the training data, the model has learned what productive geometries look like.

Second, the model produces sequence-structure pairs rather than just sequences. Every generated design comes with its predicted 3D structure, which means no post-hoc structure prediction step is required to inspect the candidate.

Third, because the model is generative, it samples. Different random seeds produce different valid designs. This makes it natural to generate a small batch of candidates and select from them.

What the wet-lab evidence actually shows

Published results from diffusion-based enzyme design are still limited but genuinely informative. The DISCO paper reports wet-lab validation on four carbene-transfer reactions, including cyclopropanation of styrene with ethyl diazoacetate and B-H insertion into boranes. From roughly 20,000 generated sequence-structure pairs, the authors computationally filtered down to around 90 candidates for experimental testing.

The top designed cyclopropanation enzyme reached 4,050 total turnovers, which exceeds the turnover numbers previously reported for enzyme-catalysed cyclopropanation by natural or classically engineered catalysts for the same reaction class. The top B-H insertion enzyme reached 5,170 TTN. A third reaction matched activity levels that had previously required 14 rounds of directed evolution on a natural starting enzyme. A fourth reaction produced enzymes with modest initial activity that responded well to directed evolution, gaining roughly four-fold activity improvements in a single round.

These are results worth taking seriously, and also worth keeping in perspective. Four reaction classes is a narrow validation set. The strongest evidence is for carbene-transfer chemistry, where the training data presumably contains useful geometric priors. Generalisation to arbitrary reaction classes has not been demonstrated at the same rigour. Published designed enzymes for hydrolysis, oxidation, or reduction chemistries from the same generation of tools remain sparse.

The reasonable summary: for reaction classes close to the training distribution, diffusion-based de novo enzyme design appears to deliver functional enzymes on a sharply lower candidate budget than classical approaches required. For novel chemistries far from the training distribution, results are still uncertain, and the design workflow should be treated as a hypothesis generator rather than a delivery pipeline.

Validating a designed enzyme before the wet lab

One of the practical weaknesses of earlier de novo design workflows was a long gap between a computational design and any independent check on whether that design was actually plausible. A designed sequence could look reasonable in isolation but fold to something different, or fold correctly but orient the ligand in a non-productive geometry.

Two checks can be done in silico before ordering a gene.

Co-designability

A designed sequence is co-designable if, when you refold it with an independent structure prediction model, the refolded structure matches the design. The specific test is backbone RMSD between the design and the refold, after Kabsch alignment. Below roughly 2 angstroms, the sequence genuinely encodes the intended fold. Above 3 angstroms, the sequence refolds to something else and the design should be treated with scepticism.

The obvious tool for the refolding step is a structure prediction model that was not involved in the original design. Boltz-2, released by Boltz in 2025, is a natural choice because it accepts both protein and small-molecule inputs and predicts the complex jointly. See our comparison of AlphaFold, Boltz, and ESMFold for how Boltz-2 positions against other current structure predictors.

Running every design through an independent refold is cheap compared to a wet-lab expression campaign, and it rules out a clear failure mode.

Pocket geometry inspection

Even for co-designable designs, the geometry of the ligand relative to the active site matters. A human inspecting the 3D structure can often spot obvious problems: the ligand placed far from any polar residue that could stabilise a transition state charge, or buried with no solvent access, or positioned so that the reaction trajectory is blocked. This inspection step is impossible to fully automate, but it is fast with a decent 3D viewer.

For a more quantitative check, comparing a designed enzyme against known natural enzymes that catalyse the same reaction class can highlight whether the active site looks plausible. Sequence and structural comparisons have long been part of enzyme engineering, and the same comparisons apply to de novo designs.

A practical workflow, step by step

Putting the pieces together, a reasonable workflow for moving from a reaction of interest to a testable enzyme looks like this.

Step 1: Define the substrate

Get the SMILES string for the substrate or reaction intermediate. For enzyme design, the relevant molecule is usually the substrate bound in the active site, or if available, a transition-state analogue. For carbene-transfer chemistry the relevant species is the reactive intermediate, not the diazo precursor.

Step 2: Decide on scaffolding

If known catalytic residues should be preserved, supply them as a partial sequence with - characters at positions the model should design. Diffusion-based models generally accept partial sequence constraints as motif scaffolding. If starting from scratch, a 150 to 250 residue fully masked protein is a reasonable default for small-molecule binding.

Step 3: Generate a small batch of candidates

Generate three to five independent designs per submission. Each seed produces a different valid candidate. A single design is rarely useful because there is no way to compare it against alternatives. A batch of three gives you room to pick the best by downstream metrics.

Step 4: Refold every design

Run each design through an independent structure predictor. Reject designs with backbone RMSD above 2 angstroms. Be willing to flag designs in the 2 to 3 angstrom range for visual inspection before proceeding.

Step 5: Inspect active sites

Open each surviving design in a 3D viewer. Check that the ligand sits inside a recognisable pocket, that residues around the pocket are reasonable, and that the overall fold makes sense. Quick visual filtering catches more problems than any scalar metric.

Step 6: Pick one or two for wet lab

Order the gene as a codon-optimised synthetic construct for your chosen expression host. Express, purify, and assay. Expect that some designs will fail at expression, some will express but be inactive, and some will show initial activity. Plan for at least one round of directed evolution on any design that shows initial activity. As the DISCO paper shows, designed enzymes tend to be evolvable.

Step 7: Iterate if needed

If the first batch produces nothing useful, regenerate with different scaffolding constraints, different protein lengths, or a different substrate representation. The cost of a second computational batch is low. The cost of a second experimental round is high. Spend compute before committing to experiments.

What this does not replace

Diffusion-based enzyme design is useful for specific problems. It is not a replacement for the rest of enzyme engineering.

Kinetic characterisation still requires wet-lab assays. No computational method predicts kcat or Km with the accuracy needed to choose between candidates on affinity grounds alone. Selectivity, stereoselectivity, and substrate scope all have to be measured.

Stability, expression yield, and solubility are weakly predicted from sequence. Heuristics like the Guruprasad instability index give a rough sequence-level signal, but real expression behaviour depends on codon usage, chaperone availability, signal peptides, and host-specific effects. A designed enzyme that looks computationally clean can still fail to express.

Directed evolution remains the right tool once you have a starting enzyme with any initial activity. Rounds of error-prone PCR or structure-guided mutagenesis reliably improve activity by one to three orders of magnitude in a well-run campaign. The published diffusion-design results are consistent with this: several designs were treated as starting points for evolution, not finished products.

Biosecurity considerations apply. Any tool that accepts a small-molecule input and returns a designed protein is dual-use in the technical sense. Common-sense refusals, clear laboratory biosafety procedures, and institutional review are part of the workflow, not optional additions to it.

Bringing de novo enzyme design into a molecular intelligence workspace

One practical obstacle to adopting diffusion-based enzyme design is workflow fragmentation. Generating a candidate involves a GPU-hosted model. Refolding involves a separate structure predictor. Inspecting the result involves a molecular viewer. Comparing against known natural enzymes involves sequence and structure databases. Kinetic assays live in electronic lab notebooks. Papers live in literature databases. Historically, a single enzyme design project would touch a command-line tool, a web service, a PyMOL session, a BLAST interface, an ELN, and several PubMed tabs, often in parallel.

The same observation motivates molecular intelligence platforms more broadly. If the analysis steps that follow a design can happen in the same environment as the design itself, with tool outputs flowing into each other and every result citing its source, the iteration loop gets substantially faster.

Purna’s Molecular Intelligence Platform (MIP) integrates de novo enzyme design with the surrounding analysis workflow. A researcher can ask for a designed enzyme for a target substrate, inspect the designed 3D structure and color-coded sequence in place, compare the design against a Boltz-2 refold in a synchronised side-by-side viewer, and pull related information from UniProt, PDB, ChEMBL, and PubMed without leaving the chat. This is the same pattern MIP applies to structure prediction, protein-ligand docking, and computational target discovery.

The specific value for enzyme design is that the refolding validation step, which is easy to skip when tools live in different environments, becomes automatic. Every design is refolded with Boltz-2 and flagged with a co-designability verdict before a human has to decide whether to keep going. That single habit catches a large fraction of failure cases before they reach the wet lab.

What is coming next

Several trends are likely to shape the next generation of de novo enzyme design.

Broader chemistry coverage. Current models are strongest on chemistries well-represented in training data. Expansion to underrepresented classes including hydrolysis of non-natural substrates, bio-orthogonal click-like chemistry, and CO2 fixation is the obvious direction. Data availability, not modelling, is the current bottleneck.

Joint multi-chain and cofactor handling. Many real enzymes are multi-chain complexes, often with metal ions, flavins, or heme cofactors. Current diffusion enzyme design models handle single-chain designs without cofactor dependencies well. Extension to these more realistic settings is in active development.

Iterative design loops. A natural next step is tooling that takes an initial design, a wet-lab result, and proposes a refined design. Closed-loop design is well-established in small-molecule discovery and is starting to appear for proteins.

Tighter validation. Co-designability via independent refolding is a useful early filter. Additional in silico filters, including stability prediction for designed residues, aggregation propensity prediction, and off-target binding screening, would give a cleaner funnel before experimental work.

Foundation models for protein chemistry. Models that understand both protein sequence and small-molecule chemistry as a single representational space are emerging. As these mature, enzyme design is likely to become one application among many, alongside property prediction, binding affinity estimation, and reaction outcome prediction.

Getting started

For researchers interested in trying de novo enzyme design on a real problem, the barrier to entry has dropped meaningfully in the past year. Open-source diffusion models are available. Pre-trained weights for several models are published on Hugging Face. Structure prediction tools like Boltz-2 have public APIs. Every piece of the pipeline exists.

The more challenging aspect is integration: getting candidate generation, structure prediction, validation, visualisation, and database lookup to work together in a single environment with consistent data flow and reproducible results.

Purna’s Molecular Intelligence Platform provides this integrated environment. Researchers can submit a SMILES and a target reaction, inspect designed enzymes together with their Boltz-2 refolds, compare sequences against natural enzymes from UniProt, and connect results to 30 or more clinical and biological databases. For teams in drug discovery and biocatalysis looking to shorten the cycle from substrate to testable enzyme, having these capabilities in one workspace removes the overhead of stitching together disconnected tools.

Researchers can apply for up to $10,000 in free MIP credits to explore these workflows, including enzyme design and the surrounding analysis tools.

Purna AI’s Molecular Intelligence Platform (MIP) is an AI-powered workspace for biology teams. It brings together molecular analysis, variant interpretation, protein structure prediction, de novo enzyme design, and clinical database integrations into one environment. Built for teams who work with biological data and need consistent, reproducible answers without juggling disconnected tools. Learn more at purna.ai.