ABSTRACT

Protein folding is the physical process by which the one-dimensional protein structure “polypeptide chain” assumes its functional form (conformation) by transforming into its three-dimensional (3D) functional structure. If the 3D structure is flawed, it affects the protein’s expected function in the cells and body. The malfunctioning of the protein due to the misfolding is one of the main causes of some diseases, such as Alzheimer’s disease, mad cow disease, and some types of cancer.

Protein structure prediction (PSP) derives the 3D structure of a protein from its amino acid sequence. PSP is considered one of the most researched topics by bioinformatics; it is involved in medical fields such as drug design and the design of novel enzymes.

PSP remains an extremely difficult and unsolved task. The two main difficulties concerning PSP are calculation of protein free energy and finding the global minimum of this energy.

Protein structure determination lab procedures are used to define and determine the exact native structure of a given protein. Examples of these processes include X-ray diffraction and nuclear magnetic resonance spectroscopy, which are time-consuming, expensive, and could be redone for multiple times due to its complex nature. These disadvantages forced the development of computationally driven prediction techniques.

Ab initio PSP is the process of predicting a given protein’s structure using its amino acid sequence only. It is computationally challenging because of the large number of possibilities to be searched and the complexity of energy functions. Our objective is to solve the PSP problem and tackle the obstacles of finding a global minimum energy of any given peptide by developing a method that could achieve this goal.

We compared a collection of PSP ab initio-based methods, namely, Peptide Fold 2 (PEP-FOLD2), PEP-FOLD3, and QUARK, against our technique, 3dProFold, in terms of accuracy and time consumption. The findings show that using a metaheuristic-based search method that utilizes genetic algorithm can achieve the same or better results than time-consuming methods. The time consumption of PEP-FOLD2 and PEP-FOLD3 is much smaller than the other methods, but their accuracies are not guaranteed to be high. Both QUARK and 3dProFold may forsake time in favor of accuracy. In 3dProFold method, the generated structures may be distorted and noisy. An enhancement step was added to confirm that the protein structures are at rest in the 3D space.