Alphafold

AlphaFold – AI breakthrough unfolding a new path for Protein Research

‘Structure is Function’ – is a principle on which biology is based. 

Introduction

Proteins are essentially the crux of life – be it in a single cell (enzymes, hormones, etc) or outside of our bodies in our surroundings (food, medicines, etc). The function of all these proteins is dependent solely on its 3D structure and any change in this structure, alters its performance.

Nobel Laureate Christian Anfinsen believed that ‘the folding pattern and 3D structure of any protein can be determined from its amino acid sequence’.  Further to his experiments and based on his theories, numerous research groups have been working relentlessly on the ‘Protein-Folding Problem’ to figure out how and why the protein attains only a specific conformation out of the multitude of possible folding patterns. 

Predicting Quaternary Structure of Protein from Primary Structure

Conventionally, scientists have been using techniques like X-ray crystallography, Nuclear Magnetic Resonance, Cryo-electron microscopy etc to try and elucidate the 3D protein structure. These experiments involve years of laborious work and require investment worth millions of dollars.

CASP Challenge 2020

Recently, on the 30th of November, 2020, a new milestone in Protein Research was achieved when the results of the 14th biennial Critical Assessment of Protein Structure Prediction (CASP) challenge were announced. AlphaFold – a program developed by DeepMind, an Artificial Intelligence research lab affiliated with Google and its parent company was successful in predicting protein structure from its amino acid sequence comparable with the experimental results with a GDT score of 92.4. 

The CASP challenge held by the Protein Structure Prediction Centre at the University of California, was started by Prof John Moult and his co-founders in 1994 with an aim of boosting computational research in protein biology. The assessment was based on Global Distance Test (GDT) on a 0-100 scale, depending on the similarity of the predicted structure with the experimental results. 

Figure: (Source- deepmind.com/blog), Improvements in the median accuracy of predictions in the free modelling category for the best team in each CASP, measured as best-of-5 GDT

As can be seen from the graph above, the GDT scores over the years were very low implying poor resemblance between the experimental and computational outcomes.

AlphaFold

DeepMind entered the competition for the first time in 2018 with its AlphaFold program and outwit all the participants with a GDT score of more than 60. Though not close enough to being accurate, it was definitely a hope for a successful model. AlphaFold in 2020 could predict even the most challenging protein structure with a GDT score of 87 (25 GDT points higher than its competitor)

AlphaFold: The key to the protein folding problem

AlphaFold in 2018 predicted the distance between pairs of amino acids in a protein based on structural and genetic data using deep learning, a subset of AI. However, this approach could not lead them any further. John Jumper, the project lead for AlphaFold. The team then resorted to some different thinking strategies. Jumper mentions that they started developing the program based on the principles of biology, physics, machine learning and years of experience and work of the experts in the field of protein folding over the past five decades.

How did they develop it?

This time, they also included additional information about the physical and geometrical constraints that play a role in determining the 3D conformation of a protein. The program was developed to fulfill the tough task of predicting the final protein structure of the target protein.

Jumper explained that if we consider a folded protein to be a ‘spatial graph’, then the amino acid residues can be said to be nodes and the residues in close proximity can be considered to be connected by edges.

The latest version of AlphaFold used at CASP14, was based on a neural network system and trained with the publicly available protein data bank which consists of around 170000 protein structures. In addition to this, the program was also trained with other large databases having protein sequences whose structures were not yet deduced. AlphaFold can predict the protein structure with the repetitive application of a system which involves the use of  evolutionarily related sequences, multiple sequence alignment (MSA), and a representation of amino acid residue pairs to refine this graph. This whole prediction process requires only a few days as opposed to the years of experimental research.

The impact

Professor Andrei Lupas (Director of the Max Planck Institute for Developmental Biology, Germany and a judge for the CASP challenge) said that his lab was working on a bacterial protein for almost a decade but could not really put together the information obtained from the X-ray diffraction data. But, with the insights on the protein shape from AlphaFold, they could deduce the structure in half an hour. Professor Lupas complimented the DeepMind team on the accuracy of the model and claimed that this new milestone will help them in their quest to understand how signals are transmitted across cell membranes.

So what role does this new landmark play in the common world?

AlphaFold cannot and will not definitely replace the experimental research, but as Professor Lupas commented, “It’s going to require more thinking and less pipetting.”

This quicker method for the prediction of protein structure, can assist the scientists in faster experiments and lesser investment in times of time and infrastructure. The spared time and money can be better utilized to study the function of proteins, effect of structure on function, cause of changes in protein structure, and the implications such alterations have for example in various diseases such as Parkinsons, Alzheimers, etc. This AI miracle, AlphaFold can serve as a boon to researchers and aid in drug discovery, designing protein drugs, developing enzymes for various applications, and much more. It can revolutionize research, solving a number of unsolved issues while creating several new unexplored avenues, unthinkable at the moment. 

A very recent application of AlphaFold is the prediction of COVID-19 proteins Orf3a and Orf8, the knowledge of which will certainly have a significant place in understanding the various aspects of the viral infection as well as in the development of vaccines.

Conclusion

To total it all, a statement by Professor Lupas can be quoted: “It’s a game changer. This will change medicine. It will change research. It will change bioengineering. It will change everything”

At Let’s Excel Analytics Solutions we solve life’s greatest problems with the help of Artificial Intelligence and Machine Learning. If you are stuck with such a problem and want to know how data science can solve it, then contact us.

Data Science Journey

Data Science Journey: Guidance for the New Bee


Considering the fast paced development in the world of Data Science his words are likely to become true. We live in the age of information and it’s quite usual to get overwhelmed with the amount of data we process each day, both in our professional and personal lives. The Internet these days is full of buzzwords related to machine learning, artificial intelligence, deep learning and the Internet of Things. Have you been wondering, if you can really make use of all these techniques in real life? Do you wish to begin your data science journey too? Then read this article to know where you can begin as a new bee!

Bill Gates once said, “A breakthrough in machine learning would be worth ten Microsofts”

Data Science Journey is based on the foundation of mathematical and statistical concepts which are universally applicable to all the sciences. That is the reason why data science is not limited to any specific field of study. It finds applications in numerous fields such as Healthcare, Food and Beverages, Petrochemicals, Agriculture, Defence and Space. To back these claims, let’s take a look at some common applications of artificial intelligence and machine learning in above mentioned fields:

Field NameCommon Applications
HealthcareClassification and Quantification of raw materials: Non-destructive testing of raw materials using spectroscopic sensors like IR, NIR, Raman etc.Distinguish between materials: Innovator Vs. Generic ProductDrug Discovery: Quantitative Structure Activity Relationship, Molecular modellingGenomics: Personalised medicines or dietMedical diagnosis: Cancer PredictionMaterial selection: Composition of materials that results in desired quality
Food and BeveragesAutomating sensory evaluation of productsClassification and Quantification of raw material: Identifying the source of raw materials and nutritional profile of the material (% of carbohydrate, fat and protein)Similarity between materials:Identifying substitute for an ingredientMaterial selection: Composition of materials that results in desired qualityShelf life: When is the product likely to degrade
PetrochemicalsClassification and Quantification of raw materials: Non-destructive testing of raw materials using spectroscopic sensors like IR, NIR, Raman etc.
AgricultureBetter crop yield: Identifying seeds with superior qualityCrop quality/ harvesting: Is it best time to harvest crop Shelf life: Predicting shelf life of harvested cropSoil texture using sensors
Defence and SpaceMaterial selection: Composition of materials that results in desired qualitySpace exploration: Is there water on mars?
Data Science Applications in various fields

I am sure you must have gotten interested in this new age Mantra and be wondering will this be applicable to you and how?

To know this let’s begin by answering below questions:

  • Are you dealing with large sets of data that do not make real sense to the human eye?
  • Are you currently using some tools to sort and analyze your data but still struggling and thus looking for a viable alternative?
  • Have you been told that the buzzwords of machine learning, artificial intelligence or the Internet of Things could solve a problem that you are faced with today?
  •  Are you very much fascinated by this new avenue seen all over the internet, but taking the first steps seem too daunting to make any real progress?
  • Do you believe that, trust is good but evidence is better?
Trust is good, evidence is better.

If you answered yes for any of the above questions, then yes, Data Science Journey is for you! Peter Sondergaard has once famously said that, ‘“Information is the oil of the 21st century, and analytics is the combustion engine”.

The best part is that anyone can use the data science techniques and benefit from them. You need not have to be a coder or an expert mathematician. Various software tools have been developed by experts in the field which can be purchased as per your requirements. 

Our cloud-based DataPandit software solutions is one such simple and user friendly interface developed by Let’s Excel Analytics Solutions.These softwares enable you to get appropriate insights out of your data and lead you in the right direction.

Data science can be learnt not just with theory but with hands-on experience. It can be said that Data Science is a habit, not a skill. The more you practice it, the stronger you get.

[newsletter_form]