Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38959143

RESUMO

The goal of protein structure refinement is to enhance the precision of predicted protein models, particularly at the residue level of the local structure. Existing refinement approaches primarily rely on physics, whereas molecular simulation methods are resource-intensive and time-consuming. In this study, we employ deep learning methods to extract structural constraints from protein structure residues to assist in protein structure refinement. We introduce a novel method, AnglesRefine, which focuses on a protein's secondary structure and employs transformer to refine various protein structure angles (psi, phi, omega, CA_C_N_angle, C_N_CA_angle, N_CA_C_angle), ultimately generating a superior protein model based on the refined angles. We evaluate our approach against other cutting-edge methods using the CASP11-14 and CASP15 datasets. Experimental outcomes indicate that our method generally surpasses other techniques on the CASP11-14 test dataset, while performing comparably or marginally better on the CASP15 test dataset. Our method consistently demonstrates the least likelihood of model quality degradation, e.g., the degradation percentage of our method is less than 10%, while other methods are about 50%. Furthermore, as our approach eliminates the need for conformational search and sampling, it significantly reduces computational time compared to existing refinement methods.

2.
Nat Methods ; 21(7): 1340-1348, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38918604

RESUMO

The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein-nucleic acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: Escherichia coli beta-galactosidase with inhibitor, SARS-CoV-2 virus RNA-dependent RNA polymerase with covalently bound nucleotide analog and SARS-CoV-2 virus ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. The quality of submitted ligand models and surrounding atoms were analyzed by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics and contact scores. A composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.


Assuntos
Microscopia Crioeletrônica , Modelos Moleculares , Microscopia Crioeletrônica/métodos , Ligantes , SARS-CoV-2 , COVID-19/virologia , Escherichia coli , beta-Galactosidase/química , beta-Galactosidase/metabolismo , Conformação Proteica , Reprodutibilidade dos Testes
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38609330

RESUMO

Understanding the protein structures is invaluable in various biomedical applications, such as vaccine development. Protein structure model building from experimental electron density maps is a time-consuming and labor-intensive task. To address the challenge, machine learning approaches have been proposed to automate this process. Currently, the majority of the experimental maps in the database lack atomic resolution features, making it challenging for machine learning-based methods to precisely determine protein structures from cryogenic electron microscopy density maps. On the other hand, protein structure prediction methods, such as AlphaFold2, leverage evolutionary information from protein sequences and have recently achieved groundbreaking accuracy. However, these methods often require manual refinement, which is labor intensive and time consuming. In this study, we present DeepTracer-Refine, an automated method that refines AlphaFold predicted structures by aligning them to DeepTracers modeled structure. Our method was evaluated on 39 multi-domain proteins and we improved the average residue coverage from 78.2 to 90.0% and average local Distance Difference Test score from 0.67 to 0.71. We also compared DeepTracer-Refine with Phenixs AlphaFold refinement and demonstrated that our method not only performs better when the initial AlphaFold model is less precise but also surpasses Phenix in run-time performance.


Assuntos
Evolução Biológica , Aprendizado de Máquina , Microscopia Crioeletrônica , Sequência de Aminoácidos , Bases de Dados Factuais
4.
Res Sq ; 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38343795

RESUMO

The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein/nucleic-acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: E. coli beta-galactosidase with inhibitor, SARS-CoV-2 RNA-dependent RNA polymerase with covalently bound nucleotide analog, and SARS-CoV-2 ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. We found that (1) the quality of submitted ligand models and surrounding atoms varied, as judged by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics, and contact scores, and (2) a composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.

5.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37930021

RESUMO

MOTIVATION: In recent years, the end-to-end deep learning method for single-chain protein structure prediction has achieved high accuracy. For example, the state-of-the-art method AlphaFold, developed by Google, has largely increased the accuracy of protein structure predictions to near experimental accuracy in some of the cases. At the same time, there are few methods that can evaluate the quality of protein complexes at the residue level. In particular, evaluating the quality of residues at the interface of protein complexes can lead to a wide range of applications, such as protein function analysis and drug design. In this paper, we introduce a new deep graph neural network-based method ComplexQA, to evaluate the local quality of interfaces for protein complexes by utilizing the residue-level structural information in 3D space and the sequence-level constraints. RESULTS: We benchmark our method to other state-of-the-art quality assessment approaches on the HAF2 and DBM55-AF2 datasets (high-quality structural models predicted by AlphaFold-Multimer), and the BM5 docking dataset. The experimental results show that our proposed method achieves better or similar performance compared with other state-of-the-art methods, especially on difficult targets which only contain a few acceptable models. Our method is able to suggest a score for each interfac e residue, which demonstrates a powerful assessment tool for the ever-increasing number of protein complexes. AVAILABILITY: https://github.com/Cao-Labs/ComplexQA.git. Contact: caora@plu.edu.


Assuntos
Redes Neurais de Computação , Proteínas , Proteínas/química
6.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36682003

RESUMO

Cryo-electron microscopy (cryo-EM) allows a macromolecular structure such as protein-DNA/RNA complexes to be reconstructed in a three-dimensional coulomb potential map. The structural information of these macromolecular complexes forms the foundation for understanding the molecular mechanism including many human diseases. However, the model building of large macromolecular complexes is often difficult and time-consuming. We recently developed DeepTracer-2.0, an artificial-intelligence-based pipeline that can build amino acid and nucleic acid backbones from a single cryo-EM map, and even predict the best-fitting residues according to the density of side chains. The experiments showed improved accuracy and efficiency when benchmarking the performance on independent experimental maps of protein-DNA/RNA complexes and demonstrated the promising future of macromolecular modeling from cryo-EM maps. Our method and pipeline could benefit researchers worldwide who work in molecular biomedicine and drug discovery, and substantially increase the throughput of the cryo-EM model building. The pipeline has been integrated into the web portal https://deeptracer.uw.edu/.


Assuntos
DNA , RNA , Humanos , Microscopia Crioeletrônica/métodos , Modelos Moleculares , Conformação Proteica , Substâncias Macromoleculares/química
7.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38197309

RESUMO

Although some pyroptosis-related (PR) prognostic models for cancers have been reported, pyroptosis-based features have not been fully discovered at the single-cell level in hepatocellular carcinoma (HCC). In this study, by deeply integrating single-cell and bulk transcriptome data, we systematically investigated significance of the shared pyroptotic signature at both single-cell and bulk levels in HCC prognosis. Based on the pyroptotic signature, a robust PR risk system was constructed to quantify the prognostic risk of individual patient. To further verify capacity of the pyroptotic signature on predicting patients' prognosis, an attention mechanism-based deep neural network classification model was constructed. The mechanisms of prognostic difference in the patients with distinct PR risk were dissected on tumor stemness, cancer pathways, transcriptional regulation, immune infiltration and cell communications. A nomogram model combining PR risk with clinicopathologic data was constructed to evaluate the prognosis of individual patients in clinic. The PR risk could also evaluate therapeutic response to neoadjuvant therapies in HCC patients. In conclusion, the constructed PR risk system enables a comprehensive assessment of tumor microenvironment characteristics, accurate prognosis prediction and rational therapeutic options in HCC.


Assuntos
Carcinoma Hepatocelular , Aprendizado Profundo , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/terapia , Transcriptoma , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/terapia , Comunicação Celular , Microambiente Tumoral/genética
8.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34553747

RESUMO

MOTIVATION: The Estimation of Model Accuracy problem is a cornerstone problem in the field of Bioinformatics. As of CASP14, there are 79 global QA methods, and a minority of 39 residue-level QA methods with very few of them working on protein complexes. Here, we introduce ZoomQA, a novel, single-model method for assessing the accuracy of a tertiary protein structure/complex prediction at residue level, which have many applications such as drug discovery. ZoomQA differs from others by considering the change in chemical and physical features of a fragment structure (a portion of a protein within a radius $r$ of the target amino acid) as the radius of contact increases. Fourteen physical and chemical properties of amino acids are used to build a comprehensive representation of every residue within a protein and grade their placement within the protein as a whole. Moreover, we have shown the potential of ZoomQA to identify problematic regions of the SARS-CoV-2 protein complex. RESULTS: We benchmark ZoomQA on CASP14, and it outperforms other state-of-the-art local QA methods and rivals state of the art QA methods in global prediction metrics. Our experiment shows the efficacy of these new features and shows that our method is able to match the performance of other state-of-the-art methods without the use of homology searching against databases or PSSM matrices. AVAILABILITY: http://zoomQA.renzhitech.com.


Assuntos
COVID-19 , Caspases/química , Aprendizado de Máquina , Modelos Moleculares , SARS-CoV-2/química , Proteínas Virais/química , Humanos , Estrutura Quaternária de Proteína , Estrutura Terciária de Proteína , Análise de Sequência de Proteína
9.
Curr Gene Ther ; 22(2): 132-143, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34161210

RESUMO

With new developments in biomedical technology, it is now a viable therapeutic treatment to alter genes with techniques like CRISPR. At the same time, it is increasingly cheaper to perform whole genome sequencing, resulting in rapid advancement in gene therapy and editing in precision medicine. Understanding the current industry and academic applications of gene therapy provides an important backdrop to future scientific developments. Additionally, machine learning and artificial intelligence techniques allow for the reduction of time and money spent in the development of new gene therapy products and techniques. In this paper, we survey the current progress of gene therapy treatments for several diseases and explore machine learning applications in gene therapy. We also discuss the ethical implications of gene therapy and the use of machine learning in precision medicine. Machine learning and gene therapy are both topics gaining popularity in various publications, and we conclude that there is still room for continued research and application of machine learning techniques in the gene therapy field.


Assuntos
Inteligência Artificial , Aprendizado de Máquina , Terapia Genética , Medicina de Precisão
10.
Curr Med Chem ; 29(5): 807-821, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34636289

RESUMO

Malaria caused by Plasmodium falciparum is one of the major infectious diseases in the world. It is essential to exploit an effective method to predict secretory proteins of malaria parasites to develop effective cures and treatment. Biochemical assays can provide details for accurate identification of the secretory proteins, but these methods are expensive and time-consuming. In this paper, we summarized the machine learningbased identification algorithms and compared the construction strategies between different computational methods. Also, we discussed the use of machine learning to improve the ability of algorithms to identify proteins secreted by malaria parasites.


Assuntos
Malária Falciparum , Malária , Parasitos , Animais , Humanos , Aprendizado de Máquina , Malária/diagnóstico , Malária Falciparum/diagnóstico , Malária Falciparum/parasitologia , Parasitos/metabolismo , Plasmodium falciparum/química , Proteínas de Protozoários/química , Proteínas de Protozoários/metabolismo
11.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34410360

RESUMO

The global pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2, has led to a dramatic loss of human life worldwide. Despite many efforts, the development of effective drugs and vaccines for this novel virus will take considerable time. Artificial intelligence (AI) and machine learning (ML) offer promising solutions that could accelerate the discovery and optimization of new antivirals. Motivated by this, in this paper, we present an extensive survey on the application of AI and ML for combating COVID-19 based on the rapidly emerging literature. Particularly, we point out the challenges and future directions associated with state-of-the-art solutions to effectively control the COVID-19 pandemic. We hope that this review provides researchers with new insights into the ways AI and ML fight and have fought the COVID-19 outbreak.


Assuntos
Tratamento Farmacológico da COVID-19 , Vacinas contra COVID-19/genética , Descoberta de Drogas , SARS-CoV-2/genética , Inteligência Artificial , COVID-19/genética , COVID-19/virologia , Vacinas contra COVID-19/química , Desenho de Fármacos , Humanos , Aprendizado de Máquina , Pandemias , SARS-CoV-2/química , SARS-CoV-2/patogenicidade
12.
Math Biosci Eng ; 18(4): 3348-3363, 2021 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-34198389

RESUMO

N4-methylcytosine (4mC) is a kind of DNA modification which could regulate multiple biological processes. Correctly identifying 4mC sites in genomic sequences can provide precise knowledge about their genetic roles. This study aimed to develop an ensemble model to predict 4mC sites in the mouse genome. In the proposed model, DNA sequences were encoded by k-mer, enhanced nucleic acid composition and composition of k-spaced nucleic acid pairs. Subsequently, these features were optimized by using minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) and five-fold cross-validation. The obtained optimal features were inputted into random forest classifier for discriminating 4mC from non-4mC sites in mouse. On the independent dataset, our model could yield the overall accuracy of 85.41%, which was approximately 3.8% -6.3% higher than the two existing models, i4mC-Mouse and 4mCpred-EL respectively. The data and source code of the model can be freely download from https://github.com/linDing-groups/model_4mc.


Assuntos
Citosina , DNA , Animais , Biologia Computacional , Genoma , Aprendizado de Máquina , Camundongos , Software
13.
Nat Methods ; 18(2): 156-164, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33542514

RESUMO

This paper describes outcomes of the 2019 Cryo-EM Model Challenge. The goals were to (1) assess the quality of models that can be produced from cryogenic electron microscopy (cryo-EM) maps using current modeling software, (2) evaluate reproducibility of modeling results from different software developers and users and (3) compare performance of current metrics used for model evaluation, particularly Fit-to-Map metrics, with focus on near-atomic resolution. Our findings demonstrate the relatively high accuracy and reproducibility of cryo-EM models derived by 13 participating teams from four benchmark maps, including three forming a resolution series (1.8 to 3.1 Å). The results permit specific recommendations to be made about validating near-atomic cryo-EM structures both in the context of individual experiments and structure data archives such as the Protein Data Bank. We recommend the adoption of multiple scoring parameters to provide full and objective annotation and assessment of the model, reflective of the observed cryo-EM map density.


Assuntos
Microscopia Crioeletrônica/métodos , Modelos Moleculares , Cristalografia por Raios X , Conformação Proteica , Proteínas/química
14.
Sci Rep ; 10(1): 4282, 2020 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-32152330

RESUMO

Cryo-electron microscopy (cryo-EM) has become a leading technology for determining protein structures. Recent advances in this field have allowed for atomic resolution. However, predicting the backbone trace of a protein has remained a challenge on all but the most pristine density maps (<2.5 Å resolution). Here we introduce a deep learning model that uses a set of cascaded convolutional neural networks (CNNs) to predict Cα atoms along a protein's backbone structure. The cascaded-CNN (C-CNN) is a novel deep learning architecture comprised of multiple CNNs, each predicting a specific aspect of a protein's structure. This model predicts secondary structure elements (SSEs), backbone structure, and Cα atoms, combining the results of each to produce a complete prediction map. The cascaded-CNN is a semantic segmentation image classifier and was trained using thousands of simulated density maps. This method is largely automatic and only requires a recommended threshold value for each protein density map. A specialized tabu-search path walking algorithm was used to produce an initial backbone trace with Cα placements. A helix-refinement algorithm made further improvements to the α-helix SSEs of the backbone trace. Finally, a novel quality assessment-based combinatorial algorithm was used to effectively map protein sequences onto Cα traces to obtain full-atom protein structures. This method was tested on 50 experimental maps between 2.6 Å and 4.4 Å resolution. It outperformed several state-of-the-art prediction methods including Rosetta de-novo, MAINMAST, and a Phenix based method by producing the most complete predicted protein structures, as measured by percentage of found Cα atoms. This method accurately predicted 88.9% (mean) of the Cα atoms within 3 Å of a protein's backbone structure surpassing the 66.8% mark achieved by the leading alternate method (Phenix based fully automatic method) on the same set of density maps. The C-CNN also achieved an average root-mean-square deviation (RMSD) of 1.24 Å on a set of 50 experimental density maps which was tested by the Phenix based fully automatic method. The source code and demo of this research has been published at https://github.com/DrDongSi/Ca-Backbone-Prediction.


Assuntos
Algoritmos , Microscopia Crioeletrônica/métodos , Aprendizado Profundo , Redes Neurais de Computação , Conformação Proteica , Proteínas/química , Software , Sequência de Aminoácidos , Humanos , Modelos Moleculares , Homologia de Sequência
15.
BMC Plant Biol ; 19(1): 469, 2019 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-31690290

RESUMO

BACKGROUND: Soybean (Glycine max (L.)) is one the most important oil-yielding cash crops. However, the soybean production has been seriously restricted by salinization. It is therefore crucial to identify salt tolerance-related genes and reveal molecular mechanisms underlying salt tolerance in soybean crops. A better understanding of how plants resist salt stress provides insights in improving existing soybean varieties as well as cultivating novel salt tolerant varieties. In this study, the biological function of GmNHX1, a NHX-like gene, and the molecular basis underlying GmNHX1-mediated salt stress resistance have been revealed. RESULTS: We found that the transcription level of GmNHX1 was up-regulated under salt stress condition in soybean, reaching its peak at 24 h after salt treatment. By employing the virus-induced gene silencing technique (VIGS), we also found that soybean plants became more susceptible to salt stress after silencing GmNHX1 than wild-type and more silenced plants wilted than wild-type under salt treatment. Furthermore, Arabidopsis thaliana expressing GmNHX1 grew taller and generated more rosette leaves under salt stress condition compared to wild-type. Exogenous expression of GmNHX1 resulted in an increase of Na+ transportation to leaves along with a reduction of Na+ absorption in roots, and the consequent maintenance of a high K+/Na+ ratio under salt stress condition. GmNHX1-GFP-transformed onion bulb endothelium cells showed fluorescent pattern in which GFP fluorescence signals enriched in vacuolar membranes. Using the non-invasive micro-test technique (NMT), we found that the Na+ efflux rate of both wild-type and transformed plants after salt treatment were significantly higher than that of before salt treatment. Additionally, the Na+ efflux rate of transformed plants after salt treatment were significantly higher than that of wild-type. Meanwhile, the transcription levels of three osmotic stress-related genes, SKOR, SOS1 and AKT1 were all up-regulated in GmNHX1-expressing plants under salt stress condition. CONCLUSION: Vacuolar membrane-localized GmNHX1 enhances plant salt tolerance through maintaining a high K+/Na+ ratio along with inducing the expression of SKOR, SOS1 and AKT1. Our findings provide molecular insights on the roles of GmNHX1 and similar sodium/hydrogen exchangers in regulating salt tolerance.


Assuntos
Glycine max/metabolismo , Proteínas de Plantas/metabolismo , Tolerância ao Sal/genética , Plantas Tolerantes a Sal/metabolismo , Trocadores de Sódio-Hidrogênio/metabolismo , Arabidopsis/genética , Inativação Gênica , Proteínas de Plantas/genética , Potássio/metabolismo , Plantas Tolerantes a Sal/genética , Sódio/metabolismo , Trocadores de Sódio-Hidrogênio/genética , Glycine max/genética , Estresse Fisiológico/genética , Regulação para Cima , Vacúolos/metabolismo
16.
Proteins ; 87(12): 1165-1178, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-30985027

RESUMO

Predicting residue-residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets. Deep learning also successfully integrated one-dimensional structural features, two-dimensional contact information, and three-dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.


Assuntos
Biologia Computacional , Conformação Proteica , Proteínas/ultraestrutura , Software , Algoritmos , Bases de Dados de Proteínas , Aprendizado Profundo , Modelos Moleculares , Redes Neurais de Computação , Dobramento de Proteína , Estrutura Terciária de Proteína/genética , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína
17.
Curr Drug Metab ; 20(3): 185-193, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30124147

RESUMO

BACKGROUND: Drug discovery, which is the process of discovering new candidate medications, is very important for pharmaceutical industries. At its current stage, discovering new drugs is still a very expensive and time-consuming process, requiring Phases I, II and III for clinical trials. Recently, machine learning techniques in Artificial Intelligence (AI), especially the deep learning techniques which allow a computational model to generate multiple layers, have been widely applied and achieved state-of-the-art performance in different fields, such as speech recognition, image classification, bioinformatics, etc. One very important application of these AI techniques is in the field of drug discovery. METHODS: We did a large-scale literature search on existing scientific websites (e.g, ScienceDirect, Arxiv) and startup companies to understand current status of machine learning techniques in drug discovery. RESULTS: Our experiments demonstrated that there are different patterns in machine learning fields and drug discovery fields. For example, keywords like prediction, brain, discovery, and treatment are usually in drug discovery fields. Also, the total number of papers published in drug discovery fields with machine learning techniques is increasing every year. CONCLUSION: The main focus of this survey is to understand the current status of machine learning techniques in the drug discovery field within both academic and industrial settings, and discuss its potential future applications. Several interesting patterns for machine learning techniques in drug discovery fields are discussed in this survey.


Assuntos
Descoberta de Drogas , Aprendizado de Máquina , Biologia Computacional/métodos , Indústria Farmacêutica , Humanos , Inquéritos e Questionários
18.
Sci Rep ; 8(1): 9939, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29967418

RESUMO

Every two years groups worldwide participate in the Critical Assessment of Protein Structure Prediction (CASP) experiment to blindly test the strengths and weaknesses of their computational methods. CASP has significantly advanced the field but many hurdles still remain, which may require new ideas and collaborations. In 2012 a web-based effort called WeFold, was initiated to promote collaboration within the CASP community and attract researchers from other fields to contribute new ideas to CASP. Members of the WeFold coopetition (cooperation and competition) participated in CASP as individual teams, but also shared components of their methods to create hybrid pipelines and actively contributed to this effort. We assert that the scale and diversity of integrative prediction pipelines could not have been achieved by any individual lab or even by any collaboration among a few partners. The models contributed by the participating groups and generated by the pipelines are publicly available at the WeFold website providing a wealth of data that remains to be tapped. Here, we analyze the results of the 2014 and 2016 pipelines showing improvements according to the CASP assessment as well as areas that require further adjustments and research.


Assuntos
Caspase 12/metabolismo , Caspases/metabolismo , Biologia Computacional/métodos , Modelos Moleculares , Software , Caspase 12/química , Caspases/química , Humanos , Conformação Proteica
19.
Molecules ; 22(10)2017 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-29039790

RESUMO

With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language "ProLan" to the protein function language "GOLan", and build a neural machine translation model based on recurrent neural networks to translate "ProLan" language to "GOLan" language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction.


Assuntos
Biologia Computacional/métodos , Redes Neurais de Computação , Proteínas/metabolismo , Software , Algoritmos , Bases de Dados de Proteínas , Ontologia Genética , Aprendizado de Máquina , Reprodutibilidade dos Testes
20.
Bioinformatics ; 33(4): 586-588, 2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-28035027

RESUMO

Motivation: Protein model quality assessment (QA) plays a very important role in protein structure prediction. It can be divided into two groups of methods: single model and consensus QA method. The consensus QA methods may fail when there is a large portion of low quality models in the model pool. Results: In this paper, we develop a novel single-model quality assessment method QAcon utilizing structural features, physicochemical properties, and residue contact predictions. We apply residue-residue contact information predicted by two protein contact prediction methods PSICOV and DNcon to generate a new score as feature for quality assessment. This novel feature and other 11 features are used as input to train a two-layer neural network on CASP9 datasets to predict the quality of a single protein model. We blindly benchmarked our method QAcon on CASP11 dataset as the MULTICOM-CLUSTER server. Based on the evaluation, our method is ranked as one of the top single model QA methods. The good performance of the features based on contact prediction illustrates the value of using contact information in protein quality assessment. Availability and Implementation: The web server and the source code of QAcon are freely available at: http://cactus.rnet.missouri.edu/QAcon. Contact: chengji@missouri.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado de Máquina , Modelos Moleculares , Proteínas/química , Animais , Humanos , Conformação Proteica , Proteínas/metabolismo , Controle de Qualidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...