Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 183
Filter
1.
J Phys Chem B ; 128(22): 5363-5370, 2024 Jun 06.
Article in English | MEDLINE | ID: mdl-38783525

ABSTRACT

In modern drug discovery, virtual ligand screening (VLS) is frequently applied to identify possible hits before experimental testing and refinement due to its cost-effective nature for large compound libraries. For decades, efforts have been devoted to developing VLS methods with high accuracy. These include the state-of-the-art FINDSITE suite of approaches FINDSITEcomb2.0, FRAGSITE, and FRAGSITE2 and the meta version FRAGSITEcomb that were developed in our lab. These methods combine ligand homology modeling (LHM), traditional ligand similarity methods, and more recently machine learning approaches to rank ligands and have proven to be superior to most recent deep learning and large language model-based approaches. Here, we describe further improvements to our previous best methods by combining the Morgan fingerprint (MF) with the originally used PubChem fingerprint and FP2 fingerprint. We then benchmarked FINDSITEcomb2.0M, FRAGSITEM, FRAGSITE2M, and the composite meta-approach FRAGSITEcombM. On the 102 target DUD-E set, the 1% enrichment factor (EF1%) and area under the precision-recall curve (AUPR) of FRAGSITEcomb increased from 42.0/0.59 to 47.6/0.72. This 0.72 AUPR is significantly better than that of the state-of-the-art deep learning-based method DenseFS's AUPR of 0.443. An independent test on the 81 targets DEKOIS2.0 set shows that EF1%/AUPR increases from 18.3/0.520 to 23.1/0.683. An ablation investigation shows that the MF contributes to most of the improvement of all four approaches. Thus, the MF is a useful addition to structure-based VLS.


Subject(s)
Drug Discovery , Ligands , Machine Learning , Molecular Structure , Drug Evaluation, Preclinical
3.
Gynecol Oncol ; 182: 168-175, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38266403

ABSTRACT

OBJECTIVE: The identification/development of a machine learning-based classifier that utilizes metabolic profiles of serum samples to accurately identify individuals with ovarian cancer. METHODS: Serum samples collected from 431 ovarian cancer patients and 133 normal women at four geographic locations were analyzed by mass spectrometry. Reliable metabolites were identified using recursive feature elimination coupled with repeated cross-validation and used to develop a consensus classifier able to distinguish cancer from non-cancer. The probabilities assigned to individuals by the model were used to create a clinical tool that assigns a likelihood that an individual patient sample is cancer or normal. RESULTS: Our consensus classification model is able to distinguish cancer from control samples with 93% accuracy. The frequency distribution of individual patient scores was used to develop a clinical tool that assigns a likelihood that an individual patient does or does not have cancer. CONCLUSIONS: An integrative approach using metabolomic profiles and machine learning-based classifiers has been employed to develop a clinical tool that assigns a probability that an individual patient does or does not have ovarian cancer. This personalized/probabilistic approach to cancer diagnostics is more clinically informative and accurate than traditional binary (yes/no) tests and represents a promising new direction in the early detection of ovarian cancer.


Subject(s)
Ovarian Neoplasms , Humans , Female , Ovarian Neoplasms/diagnosis , Metabolomics , Machine Learning , Mass Spectrometry
4.
Protein Sci ; 33(1): e4869, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38100293

ABSTRACT

Protein function annotation and drug discovery often involve finding small molecule binders. In the early stages of drug discovery, virtual ligand screening (VLS) is frequently applied to identify possible hits before experimental testing. While our recent ligand homology modeling (LHM)-machine learning VLS method FRAGSITE outperformed approaches that combined traditional docking to generate protein-ligand poses and deep learning scoring functions to rank ligands, a more robust approach that could identify a more diverse set of binding ligands is needed. Here, we describe FRAGSITE2 that shows significant improvement on protein targets lacking known small molecule binders and no confident LHM identified template ligands when benchmarked on two commonly used VLS datasets: For both the DUD-E set and DEKOIS2.0 set and ligands having a Tanimoto coefficient (TC) < 0.7 to the template ligands, the 1% enrichment factor (EF1% ) of FRAGSITE2 is significantly better than those for FINDSITEcomb2.0 , an earlier LHM algorithm. For the DUD-E set, FRAGSITE2 also shows better ROC enrichment factor and AUPR (area under the precision-recall curve) than the deep learning DenseFS scoring function. Comparison with the RF-score-VS on the 76 target subset of DEKOIS2.0 and a TC < 0.99 to training DUD-E ligands, FRAGSITE2 has double the EF1% . Its boosted tree regression method provides for more robust performance than a deep learning multiple layer perceptron method. When compared with the pretrained language model for protein target features, FRAGSITE2 also shows much better performance. Thus, FRAGSITE2 is a promising approach that can discover novel hits for protein targets. FRAGSITE2's web service is freely available to academic users at http://sites.gatech.edu/cssb/FRAGSITE2.


Subject(s)
Algorithms , Proteins , Binding Sites , Protein Conformation , Ligands , Proteins/chemistry , Protein Binding , Molecular Docking Simulation
5.
Sci Rep ; 13(1): 14650, 2023 09 05.
Article in English | MEDLINE | ID: mdl-37670110

ABSTRACT

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) poses a clear threat to humanity. It has infected over 200 million and killed 4 million people worldwide, and infections continue with no end in sight. To control the pandemic, multiple effective vaccines have been developed, and global vaccinations are in progress. However, the virus continues to mutate. Even when full vaccine coverage is achieved, vaccine-resistant mutants will likely emerge, thus requiring new annual vaccines against drifted variants analogous to influenza. A complimentary solution to this problem could be developing antiviral drugs that inhibit SARS CoV-2 and its drifted variants. Host defense peptides represent a potential source for such an antiviral as they possess broad antimicrobial activity and significant diversity across species. We screened the cathelicidin family of peptides from 16 different species for antiviral activity and identified a wild boar peptide derivative that inhibits SARS CoV-2. This peptide, which we named Yongshi and means warrior in Mandarin, acts as a viral entry inhibitor. Following the binding of SARS-CoV-2 to its receptor, the spike protein is cleaved, and heptad repeats 1 and 2 multimerize to form the fusion complex that enables the virion to enter the cell. A deep learning-based protein sequence comparison algorithm and molecular modeling suggest that Yongshi acts as a mimetic to the heptad repeats of the virus, thereby disrupting the fusion process. Experimental data confirm the binding of Yongshi to the heptad repeat 1 with a fourfold higher affinity than heptad repeat 2 of SARS-CoV-2. Yongshi also binds to the heptad repeat 1 of SARS-CoV-1 and MERS-CoV. Interestingly, it inhibits all drifted variants of SARS CoV-2 that we tested, including the alpha, beta, gamma, delta, kappa and omicron variants.


Subject(s)
COVID-19 , Cathelicidins , Humans , SARS-CoV-2 , Antiviral Agents
6.
Viruses ; 15(9)2023 08 30.
Article in English | MEDLINE | ID: mdl-37766247

ABSTRACT

The emergence of SARS-CoV-1 in 2003 followed by MERS-CoV and now SARS-CoV-2 has proven the latent threat these viruses pose to humanity. While the SARS-CoV-2 pandemic has shifted to a stage of endemicity, the threat of new coronaviruses emerging from animal reservoirs remains. To address this issue, the global community must develop small molecule drugs targeting highly conserved structures in the coronavirus proteome. Here, we characterized existing drugs for their ability to inhibit the endoribonuclease activity of the SARS-CoV-2 non-structural protein 15 (nsp15) via in silico, in vitro, and in vivo techniques. We have identified nsp15 inhibition by the drugs pibrentasvir and atovaquone which effectively inhibit SARS-CoV-2 and HCoV-OC43 at low micromolar concentrations in cell cultures. Furthermore, atovaquone, but not pibrentasvir, is observed to modulate HCoV-OC43 dsRNA and infection in a manner consistent with nsp15 inhibition. Although neither pibrentasvir nor atovaquone translate to clinical efficacy in a murine prophylaxis model of SARS-CoV-2 infection, atovaquone may serve as a basis for the design of future nsp15 inhibitors.


Subject(s)
COVID-19 , Coronavirus OC43, Human , Animals , Mice , SARS-CoV-2/metabolism , Atovaquone/pharmacology , Endoribonucleases/metabolism
7.
Bioinformatics ; 39(8)2023 08 01.
Article in English | MEDLINE | ID: mdl-37589594

ABSTRACT

MOTIVATION: Sphagnum-dominated peatlands store a substantial amount of terrestrial carbon. The genus is undersampled and under-studied. No experimental crystal structure from any Sphagnum species exists in the Protein Data Bank and fewer than 200 Sphagnum-related genes have structural models available in the AlphaFold Protein Structure Database. Tools and resources are needed to help bridge these gaps, and to enable the analysis of other structural proteomes now made possible by accurate structure prediction. RESULTS: We present the predicted structural proteome (25 134 primary transcripts) of Sphagnum divinum computed using AlphaFold, structural alignment results of all high-confidence models against an annotated nonredundant crystallographic database of over 90,000 structures, a structure-based classification of putative Enzyme Commission (EC) numbers across this proteome, and the computational method to perform this proteome-scale structure-based annotation. AVAILABILITY AND IMPLEMENTATION: All data and code are available in public repositories, detailed at https://github.com/BSDExabio/SAFA. The structural models of the S. divinum proteome have been deposited in the ModelArchive repository at https://modelarchive.org/doi/10.5452/ma-ornl-sphdiv.


Subject(s)
Plant Proteins , Proteome , Sphagnopsida , Sphagnopsida/chemistry , Sphagnopsida/enzymology , Plant Proteins/chemistry , Workflow , Structural Homology, Protein
10.
Sci Rep ; 12(1): 20889, 2022 12 03.
Article in English | MEDLINE | ID: mdl-36463386

ABSTRACT

Infectious diseases are known to cause a wide variety of post-infection complications. However, it's been challenging to identify which diseases are most associated with a given pathogen infection. Using the recently developed LeMeDISCO approach that predicts comorbid diseases associated with a given set of putative mode of action (MOA) proteins and pathogen-human protein interactomes, we developed PHEVIR, an algorithm which predicts the corresponding human disease comorbidities of 312 viruses and 57 bacteria. These predictions provide an understanding of the molecular bases of complications and means of identifying appropriate drug targets to treat them. As an illustration of its power, PHEVIR is applied to identify putative driver pathogens and corresponding human MOA proteins for Type 2 diabetes, atherosclerosis, Alzheimer's disease, and inflammatory bowel disease. Additionally, we explore the origins of the oncogenicity/oncolyticity of certain pathogens and the relationship between heart disease and influenza. The full PHEVIR database is available at https://sites.gatech.edu/cssb/phevir/ .


Subject(s)
Alzheimer Disease , Diabetes Mellitus, Type 2 , Humans , Artificial Intelligence , Algorithms , Databases, Factual
11.
Elife ; 112022 12 28.
Article in English | MEDLINE | ID: mdl-36576775

ABSTRACT

To reach their final destinations, outer membrane proteins (OMPs) of gram-negative bacteria undertake an eventful journey beginning in the cytosol. Multiple molecular machines, chaperones, proteases, and other enzymes facilitate the translocation and assembly of OMPs. These helpers usually associate, often transiently, forming large protein assemblies. They are not well understood due to experimental challenges in capturing and characterizing protein-protein interactions (PPIs), especially transient ones. Using AF2Complex, we introduce a high-throughput, deep learning pipeline to identify PPIs within the Escherichia coli cell envelope and apply it to several proteins from an OMP biogenesis pathway. Among the top confident hits obtained from screening ~1500 envelope proteins, we find not only expected interactions but also unexpected ones with profound implications. Subsequently, we predict atomic structures for these protein complexes. These structures, typically of high confidence, explain experimental observations and lead to mechanistic hypotheses for how a chaperone assists a nascent, precursor OMP emerging from a translocon, how another chaperone prevents it from aggregating and docks to a ß-barrel assembly port, and how a protease performs quality control. This work presents a general strategy for investigating biological pathways by using structural insights gained from deep learning-based predictions.


All living cells are contained within a fatty cell membrane that allows water and only certain other molecules to pass through with ease. Bacteria only consist of a single cell, making their membrane the only interface with the surrounding environment. Gram-negative bacteria ­ which include Escherichia coli, a bacterium found in the gut of all humans ­ have an extra layer of protection, the 'outer membrane'. Proteins in this membrane are called 'outer membrane proteins' or OMPs and allow nutrients to enter the cell. But OMPs, which are made inside the cell, need to be transported to the outer membrane and folded correctly before they can perform their role. This multistep process, which involves interactions between many different proteins, is not fully understood. The journey of an OMP from the center of the cell where it is made to the outer membrane is complicated. First, the OMP needs to pass through the cell's inner membrane. To do this, it must interact with 'channel proteins' in the inner membrane that feed the OMP into the space between the two membranes, known as the bacterial envelope. This step requires the OMP to be unfolded. Once in the bacterial envelope the OMP interacts with proteins that help it fold correctly and integrate into the outer membrane. The interactions between proteins in the bacterial envelope are short-lived, making them difficult to study using lab-based experiments. An alternative approach is predicting a protein's structure from its amino acid sequence which is a difficult computational problem to solve. However, in 2020 developers behind the AlphaFold2, a deep learning program, were able to use a set of equations organized in a 'neural network' that can 'learn' from a library of known protein structures to predict unknown structures with high accuracy. Gao et al. used AF2Complex, a tool based AlphaFold2, tailored to predicting interactions between proteins, to investigate what interactions OMPs could be involved with on their way to the outer membrane. With the help of a supercomputer at the Oakridge National Laboratory, Gao et al. screened nearly 1,500 E. coli proteins within the bacterial envelope to see how they might interact with OMPs. The screen identified previously unknown interactions between proteins that suggest that the formation of the bacterial outer membrane and the integration of proteins into it involve protein complexes and molecular mechanisms that have not yet been characterized. Additionally, the screen also identified interactions that had been previously described, confirming that the deep learning approach can correctly capture real interactions. Overall, Gao et al.'s work inspires new hypotheses about the mechanisms through which OMPs are transported to the outer membrane, although further work will be needed to confirm the roles of protein interactions predicted by the computational model experimentally. Furthermore, the ability to design experiments based on computational predictions is exciting. If confirmed, the new protein interactions could help scientists better understand OMP transport, which is essential for bacterial biology. In the future, this could lead to the discovery of new targets for antibiotic drugs.


Subject(s)
Deep Learning , Escherichia coli Proteins , Escherichia coli Proteins/metabolism , Escherichia coli/metabolism , Molecular Chaperones/metabolism , Peptide Hydrolases/metabolism , Membrane Proteins/metabolism , Bacterial Outer Membrane Proteins/metabolism
12.
Nat Commun ; 13(1): 6963, 2022 11 15.
Article in English | MEDLINE | ID: mdl-36379943

ABSTRACT

Residue-residue distance information is useful for predicting tertiary structures of protein monomers or quaternary structures of protein complexes. Many deep learning methods have been developed to predict intra-chain residue-residue distances of monomers accurately, but few methods can accurately predict inter-chain residue-residue distances of complexes. We develop a deep learning method CDPred (i.e., Complex Distance Prediction) based on the 2D attention-powered residual network to address the gap. Tested on two homodimer datasets, CDPred achieves the precision of 60.94% and 42.93% for top L/5 inter-chain contact predictions (L: length of the monomer in homodimer), respectively, substantially higher than DeepHomo's 37.40% and 23.08% and GLINTER's 48.09% and 36.74%. Tested on the two heterodimer datasets, the top Ls/5 inter-chain contact prediction precision (Ls: length of the shorter monomer in heterodimer) of CDPred is 47.59% and 22.87% respectively, surpassing GLINTER's 23.24% and 13.49%. Moreover, the prediction of CDPred is complementary with that of AlphaFold2-multimer.


Subject(s)
Computational Biology , Neural Networks, Computer , Proteins/chemistry
13.
J Phys Chem B ; 126(36): 6853-6867, 2022 09 15.
Article in English | MEDLINE | ID: mdl-36044742

ABSTRACT

Protein-protein interactions (PPIs) and protein-metabolite interactions play a key role in many biochemical processes, yet they are often viewed as being independent. However, the fact that small molecule drugs have been successful in inhibiting PPIs suggests a deeper relationship between protein pockets that bind small molecules and PPIs. We demonstrate that 2/3 of PPI interfaces, including antibody-epitope interfaces, contain at least one significant small molecule ligand binding pocket. In a representative library of 50 distinct protein-protein interactions involving hundreds of mutations, >75% of hot spot residues overlap with small molecule ligand binding pockets. Hence, ligand binding pockets play an essential role in PPIs. In representative cases, evolutionary unrelated monomers that are involved in different multimeric interactions yet share the same pocket are predicted to bind the same metabolites/drugs; these results are confirmed by examples in the PDB. Thus, the binding of a metabolite can shift the equilibrium between monomers and multimers. This implicit coupling of PPI equilibria, termed "metabolic entanglement", was successfully employed to suggest novel functional relationships among protein multimers that do not directly interact. Thus, the current work provides an approach to unify metabolomics and protein interactomics.


Subject(s)
Proteins , Binding Sites , Ligands , Protein Binding , Proteins/chemistry
14.
Commun Biol ; 5(1): 870, 2022 08 25.
Article in English | MEDLINE | ID: mdl-36008469

ABSTRACT

To understand the origin of disease comorbidity and to identify the essential proteins and pathways underlying comorbid diseases, we developed LeMeDISCO (Large-Scale Molecular Interpretation of Disease Comorbidity), an algorithm that predicts disease comorbidities from shared mode of action proteins predicted by the artificial intelligence-based MEDICASCY algorithm. LeMeDISCO was applied to predict the occurrence of comorbid diseases for 3608 distinct diseases. Benchmarking shows that LeMeDISCO has much better comorbidity recall than the two molecular methods XD-score (44.5% vs. 6.4%) and the SAB score (68.6% vs. 8.0%). Its performance is somewhat comparable to the phenotype method-based Symptom Similarity Score, 63.7% vs. 100%, but LeMeDISCO works for far more cases and its large comorbidity recall is attributed to shared proteins that can help provide an understanding of the molecular mechanism(s) underlying disease comorbidity. The LeMeDISCO web server is available for academic users at: http://sites.gatech.edu/cssb/LeMeDISCO .


Subject(s)
Algorithms , Artificial Intelligence , Comorbidity , Phenotype , Proteins
15.
Nat Commun ; 13(1): 1744, 2022 04 01.
Article in English | MEDLINE | ID: mdl-35365655

ABSTRACT

Accurate descriptions of protein-protein interactions are essential for understanding biological systems. Remarkably accurate atomic structures have been recently computed for individual proteins by AlphaFold2 (AF2). Here, we demonstrate that the same neural network models from AF2 developed for single protein sequences can be adapted to predict the structures of multimeric protein complexes without retraining. In contrast to common approaches, our method, AF2Complex, does not require paired multiple sequence alignments. It achieves higher accuracy than some complex protein-protein docking strategies and provides a significant improvement over AF-Multimer, a development of AlphaFold for multimeric proteins. Moreover, we introduce metrics for predicting direct protein-protein interactions between arbitrary protein pairs and validate AF2Complex on some challenging benchmark sets and the E. coli proteome. Lastly, using the cytochrome c biogenesis system I as an example, we present high-confidence models of three sought-after assemblies formed by eight members of this system.


Subject(s)
Deep Learning , Escherichia coli/genetics , Neural Networks, Computer , Proteome , Sequence Alignment
16.
Cancer Inform ; 20: 11769351211065979, 2021.
Article in English | MEDLINE | ID: mdl-34924752

ABSTRACT

BACKGROUND: Colorectal cancer is the third largest cause of cancer-related mortality worldwide. Although current treatments with chemotherapeutics have allowed for management of colorectal cancer, additional novel treatments are essential. Intervening with the metabolic reprogramming observed in cancers called "Warburg Effect," is one of the novel strategies considered to combat cancers. In the metabolic reprogramming pathway, pyruvate dehydrogenase kinase (PDK1) plays a pivotal role. Identification and characterization of a PDK1 inhibitor is of paramount importance. Further, for efficacious treatment of colorectal cancers, combinatorial regimens are essential. To this end, we opted to identify a PDK1 inhibitor using computational structure-based drug design FINDSITEcomb and perform combinatorial studies with 5-FU for efficacious treatment of colorectal cancers. METHODS: Using computational structure-based drug design FINDSITEcomb, stearic acid (SA) was identified as a possible PDK1 inhibitor. Elucidation of the mechanism of action of SA was performed using flow cytometry, clonogenic assays. RESULTS: When the growth inhibitory potential of SA was tested on colorectal adenocarcinoma (DLD-1) cells, a 50% inhibitory concentration (IC50) of 60 µM was recorded. Moreover, SA inhibited the proliferation potential of DLD-1 cells as shown by the clonogenic assay and there was a sustained response even after withdrawal of the compound. Elucidation of the mechanism of action revealed, that the inhibitory effect of SA was through the programmed cell death pathway. There was increase in the number of apoptotic and multicaspase positive cells. SA also impacted the levels of the cell survival protein Bcl-2. With the aim of achieving improved treatment for colorectal cancer, we opted to combine 5-fluorouracil (5-FU), the currently used drug in the clinic, with SA. Combining SA with 5-FU, revealed a synergistic effect in which the IC50 of 5-FU decreased from 25 to 6 µM upon combination with 60 µM SA. Further, SA did not inhibit non-tumorigenic NIH-3T3 proliferation. CONCLUSIONS: We envision that this significant decrease in the IC50 of 5-FU could translate into less side effects of 5-FU and increase the efficacy of the treatment due to the multifaceted action of SA. The data generated from the current studies on the inhibition of colorectal adenocarcinoma by SA discovered by the use of the computational program as well as synergistic action with 5-FU should open up novel therapeutic options for the management of colorectal adenocarcinomas.

17.
Sci Rep ; 11(1): 20864, 2021 10 21.
Article in English | MEDLINE | ID: mdl-34675303

ABSTRACT

Following SARS-CoV-2 infection, some COVID-19 patients experience severe host driven adverse events. To treat these complications, their underlying etiology and drug treatments must be identified. Thus, a novel AI methodology MOATAI-VIR, which predicts disease-protein-pathway relationships and repurposed FDA-approved drugs to treat COVID-19's clinical manifestations was developed. SARS-CoV-2 interacting human proteins and GWAS identified respiratory failure genes provide the input from which the mode-of-action (MOA) proteins/pathways of the resulting disease comorbidities are predicted. These comorbidities are then mapped to their clinical manifestations. To assess each manifestation's molecular basis, their prioritized shared proteins were subject to global pathway analysis. Next, the molecular features associated with hallmark COVID-19 phenotypes, e.g. unusual neurological symptoms, cytokine storms, and blood clots were explored. In practice, 24/26 of the major clinical manifestations are successfully predicted. Three major uncharacterized manifestation categories including neoplasms are also found. The prevalence of neoplasms suggests that SARS-CoV-2 might be an oncovirus due to shared molecular mechanisms between oncogenesis and viral replication. Then, repurposed FDA-approved drugs that might treat COVID-19's clinical manifestations are predicted by virtual ligand screening of the most frequent comorbid protein targets. These drugs might help treat both COVID-19's severe adverse events and lesser ones such as loss of taste/smell.


Subject(s)
COVID-19 Drug Treatment , COVID-19/complications , COVID-19/diagnosis , Computational Biology/methods , Neoplasms/complications , Nervous System Diseases/complications , Thrombosis/complications , Virus Replication , Benchmarking , Comorbidity , Computer Simulation , Cytokine Release Syndrome , Drug Discovery , Humans , Machine Learning , Molecular Medicine , Phenotype , SARS-CoV-2 , Treatment Outcome
18.
J Chem Inf Model ; 61(10): 4827-4831, 2021 10 25.
Article in English | MEDLINE | ID: mdl-34586808

ABSTRACT

AlphaFold 2 (AF2) was the star of CASP14, the last biannual structure prediction experiment. Using novel deep learning, AF2 predicted the structures of many difficult protein targets at or near experimental resolution. Here, we present our perspective of why AF2 works and show that it is a very sophisticated fold recognition algorithm that exploits the completeness of the library of single domain PDB structures. It has also learned local side chain packing rearrangements that enable it to refine proteins to high resolution. The benefits and limitations of its ability to predict the structures of many more proteins at or close to atomic detail are discussed.


Subject(s)
Protein Folding , Proteins , Algorithms , Amino Acid Sequence
19.
Biochem (Lond) ; 43(1): 4-12, 2021 Feb.
Article in English | MEDLINE | ID: mdl-34219990

ABSTRACT

Many of life's molecules including proteins are built from chiral building blocks. What drove homochiral building block selection? Simulations on demi-chiral proteins containing equal numbers of d- and l-amino acids show that they possess many modern homochiral protein properties. They have the same global folds and could do the same biochemistry, with ancient, essential functions being most prevalent. They could synthesize chiral RNA and lipids which formed vesicles. RNA eventually combined with proteins creating ribosomes for more efficient protein synthesis, and thus, life began. Increased native state stability from homochiral secondary structure hydrogen bonding helped drive proteins towards homochirality.

20.
Front Bioinform ; 12021 May.
Article in English | MEDLINE | ID: mdl-34308415

ABSTRACT

During the past five years, deep-learning algorithms have enabled ground-breaking progress towards the prediction of tertiary structure from a protein sequence. Very recently, we developed SAdLSA, a new computational algorithm for protein sequence comparison via deep-learning of protein structural alignments. SAdLSA shows significant improvement over established sequence alignment methods. In this contribution, we show that SAdLSA provides a general machine-learning framework for structurally characterizing protein sequences. By aligning a protein sequence against itself, SAdLSA generates a fold distogram for the input sequence, including challenging cases whose structural folds were not present in the training set. About 70% of the predicted distograms are statistically significant. Although at present the accuracy of the intra-sequence distogram predicted by SAdLSA self-alignment is not as good as deep-learning algorithms specifically trained for distogram prediction, it is remarkable that the prediction of single protein structures is encoded by an algorithm that learns ensembles of pairwise structural comparisons, without being explicitly trained to recognize individual structural folds. As such, SAdLSA can not only predict protein folds for individual sequences, but also detects subtle, yet significant, structural relationships between multiple protein sequences using the same deep-learning neural network. The former reduces to a special case in this general framework for protein sequence annotation.

SELECTION OF CITATIONS
SEARCH DETAIL
...