Use of generative AI in protein design

In the world of complex biological networks, Artificial Intelligence (AI) is providing significant guidance to disease modelling and drug discovery processes. Developing a therapeutic cure for a disease requires the critical but extremely challenging process of identifying the structure and mechanism of a suitable target protein or cellular pathway that can be modulated by antibodies or drugs.

Current clinical trials have a high failure rate of around 84.6%[i]. This startling statistic reflects the biological uncertainties faced by an average clinical trial. AI is playing a growing role in therapeutics development, with its targeted therapeutic development based on large datasets. Among the emerging trends, generative AI, a subset of AI, has shown immense potential. In this article, we discuss how generative AI can provide a shortcut for developing therapeutic candidates.

Rise of Generative AI

Generative AI is a type of Artificial Intelligence that can create a wide variety of data, such as texts and images from one or more prompts (or inputs). The ‘generative’ ability to create new data is a huge step forward from the more conventional ‘discriminative’ approach. For instance, discriminative models can be trained on English and French texts, and used to classify whether a new sentence is in English or French. The ultimate goal of discriminative models is to separate one class from another. On the other hand, generative AI models can rapidly utilise a significant volume of unlabelled data. Once a generative AI model learns patterns from existing data, this knowledge generates new and unique outputs. Recent breakthroughs in the field, such as in GPTs (Generative Pre-trained Transformers) can complete a sentence or write a full essay in response to a question.

Generative AI for therapeutics and Medicine

So how can generative AI be applied to develop novel therapeutics? One way is to use Generative AI to design the proteins themselves, which act as novel therapeutics. For instance, AI can design antibodies that provide targeted defence against a certain disease target, or synthesise therapeutic proteins to replace a protein that is abnormal or deficient in a particular disease.

Generative AI is extensively used in predicting and designing novel protein structures. Proteins are composed of a sequence of amino acids with distinctive properties, arranged in a certain order. Based on this one-dimensional amino acid sequence, proteins fold into intricate three-dimensional forms that allow them to perform biological activities. For example, a series of hydrophilic amino acid chains can form a functional unit exposed to the surface of the proteins to bind a target, or a combination of certain amino acids can be used to carry out a catalytic function.

Some generative AI models predict the probability of the next amino acid in the sequence based on the previous amino acids in the sequence. Similar to the way that natural language processing models identify semantic and grammatical rules (such as the order of the subject and verb, present and past tense), amino acid sequence patterns, local structure motifs of the proteins, including alpha helices and beta sheets, and the tertiary structure building upon these, can also be learned. This way, large language models can be used to generate functional protein sequences.

Some of the protein building platforms make use of a transformer architecture that employs self-attention mechanisms[i]. Such platforms were originally designed to identify the highest correlations amongst words within a sentence – or amino acids in a protein in this case. Using billions of known amino acid sequences as inputs to the transformer AI models, the self-attention mechanism allows testing, for example, of pairwise amino acid interaction between every single amino acid in the input sequenceⁱⁱ. The pairwise interaction is critical in determining a final 3D structure that can perform a desired function, such as target binding or catalysis. To lower the complexity in the pairwise interaction analysis, a patent application from Deepmind (US20210166779A1) discloses a method that performs sequence alignment and introduces an embedding layer based on the alignment results. An embedding layer, a hidden layer in the deep neural network, maps each amino acid to a low-dimensional vector, where each dimension represents a particular feature of the protein.

Other generative AI models can create completely new protein structures from image-like representations of the protein, instead of an amino acid sequence[ii]. A protein is broken down into triplet frames which contain unique spatial information. This set of image-like representations of existing proteins is fed into a generative diffusion model, which involves injecting noise to disrupt the original structures[iii]. The model monitors the escalating noise levels and then reverses the process, converting random pixels into distinct images that represent entirely new proteins.

Conclusion

The use of generative AI in protein design significantly reduces the time it takes to find the right candidate for therapeutics by optimising certain parts of the protein against the target. In the past couple of years since the generative AI has been used in protein design, the experimental landscape has completely transformed, replacing the need for a laborious screening process. Further, an accurate prediction on the protein interaction with its partners, as well as how these can go wrong, provides an insight to clinical solutions.

Moreover, the generative AI approach is revolutionising our understanding of protein structure by providing insights into the intricate relationships between amino acids, paving the way for accurate functional predictions. The journey of exploring vast protein landscapes towards a deeper understanding of biological processes continues to unfold.

References:

[i] Pun et al., AI-powered therapeutic target discovery, Trends in Pharmacological Sciences 44(9) 561-572

[ii] Madani et al. , W. R. Large language models generate functional protein sequences across diverse families. Nat.Biotechnology. 21 (2023)

[iii] Jin Lee et al., Score-based generative modeling for de novo protein design. Nature Computational Science (2023)

Continue reading about Sculpting proteins with code: Use of generative AI in protein design

June Juyeon Han

About the author More from the author

05.02.2025

The role of patents in promoting AI investment

In January 2025, UK Prime Minister Keir Starmer and Secretary of State for Science, Innovation and Technology Peter Kyle announced the AI Opportunities Action Plan. The Plan has three goals: (1) Invest in the foundations of AI; (2) Push hard on cross-economy AI adoption; and (3) Position the UK to be an AI maker, not an AI taker.

13.02.2025

Packaging innovations and IP

Ahead of the Packaging Innovations & Empack exhibition, Nathaniel Taylor takes a look at the forms of Intellectual Property (IP) typically arising in the packaging industry and the boundaries between the different forms of protection that might be available. In the packaging industry, companies typically seek various forms of IP protection for packaging innovations, including patents, registered designs, and trademarks.