Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Nat Commun ; 12(1): 5011, 2021 08 18.
Article in English | MEDLINE | ID: mdl-34408149

ABSTRACT

Sequence-based contact prediction has shown considerable promise in assisting non-homologous structure modeling, but it often requires many homologous sequences and a sufficient number of correct contacts to achieve correct folds. Here, we developed a method, C-QUARK, that integrates multiple deep-learning and coevolution-based contact-maps to guide the replica-exchange Monte Carlo fragment assembly simulations. The method was tested on 247 non-redundant proteins, where C-QUARK could fold 75% of the cases with TM-scores (template-modeling scores) ≥0.5, which was 2.6 times more than that achieved by QUARK. For the 59 cases that had either low contact accuracy or few homologous sequences, C-QUARK correctly folded 6 times more proteins than other contact-based folding methods. C-QUARK was also tested on 64 free-modeling targets from the 13th CASP (critical assessment of protein structure prediction) experiment and had an average GDT_TS (global distance test) score that was 5% higher than the best CASP predictors. These data demonstrate, in a robust manner, the progress in modeling non-homologous protein structures using low-accuracy and sparse contact-map predictions.


Subject(s)
Computational Biology/methods , Proteins/chemistry , Databases, Protein , Models, Molecular , Monte Carlo Method , Protein Conformation , Protein Folding , Proteins/genetics , Software
2.
Bioinformatics ; 36(7): 2105-2112, 2020 04 01.
Article in English | MEDLINE | ID: mdl-31738385

ABSTRACT

MOTIVATION: The success of genome sequencing techniques has resulted in rapid explosion of protein sequences. Collections of multiple homologous sequences can provide critical information to the modeling of structure and function of unknown proteins. There are however no standard and efficient pipeline available for sensitive multiple sequence alignment (MSA) collection. This is particularly challenging when large whole-genome and metagenome databases are involved. RESULTS: We developed DeepMSA, a new open-source method for sensitive MSA construction, which has homologous sequences and alignments created from multi-sources of whole-genome and metagenome databases through complementary hidden Markov model algorithms. The practical usefulness of the pipeline was examined in three large-scale benchmark experiments based on 614 non-redundant proteins. First, DeepMSA was utilized to generate MSAs for residue-level contact prediction by six coevolution and deep learning-based programs, which resulted in an accuracy increase in long-range contacts by up to 24.4% compared to the default programs. Next, multiple threading programs are performed for homologous structure identification, where the average TM-score of the template alignments has over 7.5% increases with the use of the new DeepMSA profiles. Finally, DeepMSA was used for secondary structure prediction and resulted in statistically significant improvements in the Q3 accuracy. It is noted that all these improvements were achieved without re-training the parameters and neural-network models, demonstrating the robustness and general usefulness of the DeepMSA in protein structural bioinformatics applications, especially for targets without homologous templates in the PDB library. AVAILABILITY AND IMPLEMENTATION: https://zhanglab.ccmb.med.umich.edu/DeepMSA/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Neural Networks, Computer , Proteins , Algorithms , Sequence Alignment , Sequence Analysis, Protein , Software
3.
Genome Biol ; 20(1): 229, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31676016

ABSTRACT

INTRODUCTION: The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and function prediction. RESULTS: By processing 1.3 TB of high-quality reads from the Tara Oceans data, we obtain 97 million non-redundant genes. Of the 5721 Pfam families that lack experimental structures, 2801 have at least one member associated with the oceanic metagenomics dataset. We apply C-QUARK, a deep-learning contact-guided ab initio structure prediction pipeline, to model 27 families, where 20 are predicted to have a reliable fold with estimated template modeling score (TM-score) at least 0.5. Detailed analyses reveal that the abundance of microbial genera in the ocean is highly correlated to the frequency of occurrence in the modeled Pfam families, suggesting the significant role of the Tara Oceans genomes in the contact-map prediction and subsequent ab initio folding simulations. Of interesting note, PF15461, which has a majority of members coming from ocean-related bacteria, is identified as an important photosynthetic protein by structure-based function annotations. The pipeline is extended to a set of 417 Pfam families, built on the combination of Tara with other metagenomics datasets, which results in 235 families with an estimated TM-score over 0.5. CONCLUSIONS: These results demonstrate a new avenue to improve the capacity of protein structure and function modeling through marine metagenomics, especially for difficult proteins with few homologous sequences.


Subject(s)
Metagenomics/methods , Models, Chemical , Proteins/genetics , Structure-Activity Relationship , Aquatic Organisms , Deep Learning , Microbiota , Multigene Family , Proteins/chemistry
4.
PLoS Comput Biol ; 15(10): e1007411, 2019 10.
Article in English | MEDLINE | ID: mdl-31622328

ABSTRACT

Accurate prediction of atomic-level protein structure is important for annotating the biological functions of protein molecules and for designing new compounds to regulate the functions. Template-based modeling (TBM), which aims to construct structural models by copying and refining the structural frameworks of other known proteins, remains the most accurate method for protein structure prediction. Due to the difficulty in recognizing distant-homology templates, however, the accuracy of TBM decreases rapidly when the evolutionary relationship between the query and template vanishes. In this study, we propose a new method, CEthreader, which first predicts residue-residue contacts by coupling evolutionary precision matrices with deep residual convolutional neural-networks. The predicted contact maps are then integrated with sequence profile alignments to recognize structural templates from the PDB. The method was tested on two independent benchmark sets consisting collectively of 1,153 non-homologous protein targets, where CEthreader detected 176% or 36% more correct templates with a TM-score >0.5 than the best state-of-the-art profile- or contact-based threading methods, respectively, for the Hard targets that lacked homologous templates. Moreover, CEthreader was able to identify 114% or 20% more correct templates with the same Fold as the query, after excluding structures from the same SCOPe Superfamily, than the best profile- or contact-based threading methods. Detailed analyses show that the major advantage of CEthreader lies in the efficient coupling of contact maps with profile alignments, which helps recognize global fold of protein structures when the homologous relationship between the query and template is weak. These results demonstrate an efficient new strategy to combine ab initio contact map prediction with profile alignments to significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.


Subject(s)
Nerve Net/physiology , Sequence Analysis, Protein/methods , Structural Homology, Protein , Algorithms , Amino Acid Sequence , Computational Biology/methods , Databases, Protein , Models, Biological , Protein Conformation , Proteins/chemistry , Sequence Alignment , Software
5.
Proteins ; 87(12): 1149-1164, 2019 12.
Article in English | MEDLINE | ID: mdl-31365149

ABSTRACT

We report the results of two fully automated structure prediction pipelines, "Zhang-Server" and "QUARK", in CASP13. The pipelines were built upon the C-I-TASSER and C-QUARK programs, which in turn are based on I-TASSER and QUARK but with three new modules: (a) a novel multiple sequence alignment (MSA) generation protocol to construct deep sequence-profiles for contact prediction; (b) an improved meta-method, NeBcon, which combines multiple contact predictors, including ResPRE that predicts contact-maps by coupling precision-matrices with deep residual convolutional neural-networks; and (c) an optimized contact potential to guide structure assembly simulations. For 50 CASP13 FM domains that lacked homologous templates, average TM-scores of the first models produced by C-I-TASSER and C-QUARK were 28% and 56% higher than those constructed by I-TASSER and QUARK, respectively. For the first time, contact-map predictions demonstrated usefulness on TBM domains with close homologous templates, where TM-scores of C-I-TASSER models were significantly higher than those of I-TASSER models with a P-value <.05. Detailed data analyses showed that the success of C-I-TASSER and C-QUARK was mainly due to the increased accuracy of deep-learning-based contact-maps, as well as the careful balance between sequence-based contact restraints, threading templates, and generic knowledge-based potentials. Nevertheless, challenges still remain for predicting quaternary structure of multi-domain proteins, due to the difficulties in domain partitioning and domain reassembly. In addition, contact prediction in terminal regions was often unsatisfactory due to the sparsity of MSAs. Development of new contact-based domain partitioning and assembly methods and training contact models on sparse MSAs may help address these issues.


Subject(s)
Computational Biology , Protein Conformation , Proteins/ultrastructure , Software , Algorithms , Amino Acid Sequence/genetics , Databases, Protein , Deep Learning , Models, Molecular , Neural Networks, Computer , Protein Folding , Proteins/chemistry , Proteins/genetics , Sequence Alignment
6.
Proteins ; 86 Suppl 1: 136-151, 2018 03.
Article in English | MEDLINE | ID: mdl-29082551

ABSTRACT

We develop two complementary pipelines, "Zhang-Server" and "QUARK", based on I-TASSER and QUARK pipelines for template-based modeling (TBM) and free modeling (FM), and test them in the CASP12 experiment. The combination of I-TASSER and QUARK successfully folds three medium-size FM targets that have more than 150 residues, even though the interplay between the two pipelines still awaits further optimization. Newly developed sequence-based contact prediction by NeBcon plays a critical role to enhance the quality of models, particularly for FM targets, by the new pipelines. The inclusion of NeBcon predicted contacts as restraints in the QUARK simulations results in an average TM-score of 0.41 for the best in top five predicted models, which is 37% higher than that by the QUARK simulations without contacts. In particular, there are seven targets that are converted from non-foldable to foldable (TM-score >0.5) due to the use of contact restraints in the simulations. Another additional feature in the current pipelines is the local structure quality prediction by ResQ, which provides a robust residue-level modeling error estimation. Despite the success, significant challenges still remain in ab initio modeling of multi-domain proteins and folding of ß-proteins with complicated topologies bound by long-range strand-strand interactions. Improvements on domain boundary and long-range contact prediction, as well as optimal use of the predicted contacts and multiple threading alignments, are critical to address these issues seen in the CASP12 experiment.


Subject(s)
Algorithms , Computational Biology/methods , Machine Learning , Models, Molecular , Protein Conformation , Protein Folding , Proteins/chemistry , Crystallography, X-Ray , Databases, Protein , Humans , Sequence Alignment , Sequence Analysis, Protein
7.
Bioinformatics ; 33(15): 2296-2306, 2017 Aug 01.
Article in English | MEDLINE | ID: mdl-28369334

ABSTRACT

MOTIVATION: Recent CASP experiments have witnessed exciting progress on folding large-size non-humongous proteins with the assistance of co-evolution based contact predictions. The success is however anecdotal due to the requirement of the contact prediction methods for the high volume of sequence homologs that are not available to most of the non-humongous protein targets. Development of efficient methods that can generate balanced and reliable contact maps for different type of protein targets is essential to enhance the success rate of the ab initio protein structure prediction. RESULTS: We developed a new pipeline, NeBcon, which uses the naïve Bayes classifier (NBC) theorem to combine eight state of the art contact methods that are built from co-evolution and machine learning approaches. The posterior probabilities of the NBC model are then trained with intrinsic structural features through neural network learning for the final contact map prediction. NeBcon was tested on 98 non-redundant proteins, which improves the accuracy of the best co-evolution based meta-server predictor by 22%; the magnitude of the improvement increases to 45% for the hard targets that lack sequence and structural homologs in the databases. Detailed data analysis showed that the major contribution to the improvement is due to the optimized NBC combination of the complementary information from both co-evolution and machine learning predictions. The neural network training also helps to improve the coupling of the NBC posterior probability and the intrinsic structural features, which were found particularly important for the proteins that do not have sufficient number of homologous sequences to derive reliable co-evolution profiles. AVAILIABLITY AND IMPLEMENTATION: On-line server and standalone package of the program are available at http://zhanglab.ccmb.med.umich.edu/NeBcon/ . CONTACT: zhng@umich.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Models, Molecular , Neural Networks, Computer , Protein Conformation , Protein Folding , Bayes Theorem , Machine Learning
8.
Article in English | MEDLINE | ID: mdl-26274304

ABSTRACT

We present a multiscale model, based on molecular dynamics (MD) and kinetic Monte Carlo (kMC), to study the aggregation driven growth of colloidal particles. Coarse-grained molecular dynamics (CGMD) simulations are employed to detect key agglomeration events and calculate the corresponding rate constants. The kMC simulations employ these rate constants in a stochastic framework to track the growth of the agglomerates over longer time scales and length scales. One of the hallmarks of the model is a unique methodology to detect and characterize agglomeration events. The model accounts for individual cluster-scale effects such as change in size due to aggregation as well as local molecular-scale effects such as changes in the number of neighbors of each molecule in a colloidal cluster. Such definition of agglomeration events allows us to grow the cluster to sizes that are inaccessible to molecular simulations as well as track the shape of the growing cluster. A well-studied system, comprising fullerenes in NaCl electrolyte solution, was simulated to validate the model. Under the simulated conditions, the agglomeration process evolves from a diffusion limited cluster aggregation (DLCA) regime to percolating cluster in transition and finally to a gelation regime. Overall the data from the multiscale numerical model shows good agreement with existing theory of colloidal particle growth. Although in the present study we validated our model by specifically simulating fullerene agglomeration in electrolyte solution, the model is versatile and can be applied to a wide range of colloidal systems.


Subject(s)
Colloids , Molecular Dynamics Simulation , Electrolytes , Fullerenes , Kinetics , Monte Carlo Method , Sodium Chloride , Solutions , Stochastic Processes
9.
J Chem Phys ; 137(24): 244308, 2012 Dec 28.
Article in English | MEDLINE | ID: mdl-23277937

ABSTRACT

The molecular interactions between solvent and nanoparticles during photoactive layer formation in organic photovoltaic (OPV) cells influence the morphology of the photoactive layer and hence determine the power conversion efficiency. Prediction of optimal synthesis parameters in OPVs, such as choice of solvent, processing temperature, and nanoparticle concentration, requires fundamental understanding of the mechanisms that govern the agglomeration of nanoparticles in solvents. In this study, we used molecular dynamics simulations to simulate a commonly used organic nanoparticle, [6,6]-phenyl-C61-butyric acid methyl ester (PCBM), in various solvents to correlate solvent-nanoparticle interactions with the size of the agglomerate structure of PCBM. We analyzed the effects of concentration of PCBM and operating temperature on the molecular rearrangement and agglomeration of PCBM in three solvents: (i) toluene, (ii) indane, and (iii) toluene-indane mixture. We evaluated the agglomeration behavior of PCBM by determining sizes of the largest clusters of PCBM and the corresponding size distributions. To obtain further insight into the agglomerate structure of PCBMs, we evaluated radial distribution functions (RDFs) and coordination numbers of the various moieties of PCBMs with respect to solvent atoms as well as with respect to that of other PCBMs. Our simulations demonstrate that PCBMs form larger clusters in toluene while they are relatively dispersed in indane, which indicates the greater solubility of PCBM in indane than in toluene. In toluene-indane mixture, PCBMs are clustered to a greater extent than in indane and less than that in toluene. To correlate agglomerate size to nanoparticle-solvent interactions, we also evaluated the potential of mean force (PMF) of the fullerene moiety of PCBM in toluene and indane. Our results also show that the cluster size of PCBM molecules increases with the increase of concentration of PCBM and the processing temperature. To correlate the PCBM agglomeration with the dynamics of solvents, we evaluated the rotational correlation functions of the solvents. Our results illustrate that toluene relaxes faster than indane in the simulated systems and relaxation time of solvent molecules decreases with the decrease of concentration of PCBM and increase of processing temperature. Results presented in this study provide fundamental insight that can help to choose favorable solvents for processing PCBMs in OPV applications.

SELECTION OF CITATIONS
SEARCH DETAIL
...