AlphaFold – AI breakthrough unfolding a new path for Protein Research
‘Structure is Function’ – is a principle on which biology is based.
Proteins are essentially the crux of life – be it in a single cell (enzymes, hormones, etc) or outside of our bodies in our surroundings (food, medicines, etc). The function of all these proteins is dependent solely on its 3D structure and any change in this structure, alters its performance.
Nobel Laureate Christian Anfinsen believed that ‘the folding pattern and 3D structure of any protein can be determined from its amino acid sequence’. Further to his experiments and based on his theories, numerous research groups have been working relentlessly on the ‘Protein-Folding Problem’ to figure out how and why the protein attains only a specific conformation out of the multitude of possible folding patterns.
Conventionally, scientists have been using techniques like X-ray crystallography, Nuclear Magnetic Resonance, Cryo-electron microscopy etc to try and elucidate the 3D protein structure. These experiments involve years of laborious work and require investment worth millions of dollars.
CASP Challenge 2020
Recently, on the 30th of November, 2020, a new milestone in Protein Research was achieved when the results of the 14th biennial Critical Assessment of Protein Structure Prediction (CASP) challenge were announced. AlphaFold – a program developed by DeepMind, an Artificial Intelligence research lab affiliated with Google and its parent company was successful in predicting protein structure from its amino acid sequence comparable with the experimental results with a GDT score of 92.4.
The CASP challenge held by the Protein Structure Prediction Centre at the University of California, was started by Prof John Moult and his co-founders in 1994 with an aim of boosting computational research in protein biology. The assessment was based on Global Distance Test (GDT) on a 0-100 scale, depending on the similarity of the predicted structure with the experimental results.
Figure: (Source- deepmind.com/blog), Improvements in the median accuracy of predictions in the free modelling category for the best team in each CASP, measured as best-of-5 GDT
As can be seen from the graph above, the GDT scores over the years were very low implying poor resemblance between the experimental and computational outcomes.
DeepMind entered the competition for the first time in 2018 with its AlphaFold program and outwit all the participants with a GDT score of more than 60. Though not close enough to being accurate, it was definitely a hope for a successful model. AlphaFold in 2020 could predict even the most challenging protein structure with a GDT score of 87 (25 GDT points higher than its competitor)
AlphaFold in 2018 predicted the distance between pairs of amino acids in a protein based on structural and genetic data using deep learning, a subset of AI. However, this approach could not lead them any further. John Jumper, the project lead for AlphaFold. The team then resorted to some different thinking strategies. Jumper mentions that they started developing the program based on the principles of biology, physics, machine learning and years of experience and work of the experts in the field of protein folding over the past five decades.
How did they develop it?
This time, they also included additional information about the physical and geometrical constraints that play a role in determining the 3D conformation of a protein. The program was developed to fulfill the tough task of predicting the final protein structure of the target protein.
Jumper explained that if we consider a folded protein to be a ‘spatial graph’, then the amino acid residues can be said to be nodes and the residues in close proximity can be considered to be connected by edges.
The latest version of AlphaFold used at CASP14, was based on a neural network system and trained with the publicly available protein data bank which consists of around 170000 protein structures. In addition to this, the program was also trained with other large databases having protein sequences whose structures were not yet deduced. AlphaFold can predict the protein structure with the repetitive application of a system which involves the use of evolutionarily related sequences, multiple sequence alignment (MSA), and a representation of amino acid residue pairs to refine this graph. This whole prediction process requires only a few days as opposed to the years of experimental research.
Professor Andrei Lupas (Director of the Max Planck Institute for Developmental Biology, Germany and a judge for the CASP challenge) said that his lab was working on a bacterial protein for almost a decade but could not really put together the information obtained from the X-ray diffraction data. But, with the insights on the protein shape from AlphaFold, they could deduce the structure in half an hour. Professor Lupas complimented the DeepMind team on the accuracy of the model and claimed that this new milestone will help them in their quest to understand how signals are transmitted across cell membranes.
So what role does this new landmark play in the common world?
AlphaFold cannot and will not definitely replace the experimental research, but as Professor Lupas commented, “It’s going to require more thinking and less pipetting.”
This quicker method for the prediction of protein structure, can assist the scientists in faster experiments and lesser investment in times of time and infrastructure. The spared time and money can be better utilized to study the function of proteins, effect of structure on function, cause of changes in protein structure, and the implications such alterations have for example in various diseases such as Parkinsons, Alzheimers, etc. This AI miracle, AlphaFold can serve as a boon to researchers and aid in drug discovery, designing protein drugs, developing enzymes for various applications, and much more. It can revolutionize research, solving a number of unsolved issues while creating several new unexplored avenues, unthinkable at the moment.
A very recent application of AlphaFold is the prediction of COVID-19 proteins Orf3a and Orf8, the knowledge of which will certainly have a significant place in understanding the various aspects of the viral infection as well as in the development of vaccines.
To total it all, a statement by Professor Lupas can be quoted: “It’s a game changer. This will change medicine. It will change research. It will change bioengineering. It will change everything”
At Let’s Excel Analytics Solutions we solve life’s greatest problems with the help of Artificial Intelligence and Machine Learning. If you are stuck with such a problem and want to know how data science can solve it, then contact us.