Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Comput Struct Biotechnol J ; 23: 309-315, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38179071

RESUMO

Neuropeptides play critical roles in many biological processes such as growth, learning, memory, metabolism, and neuronal differentiation. A few approaches have been reported for predicting neuropeptides that are cleaved from precursor protein sequences. However, these models for cleavage site prediction of precursors were developed using a limited number of neuropeptide precursor datasets and simple precursors representation models. In addition, a universal method for predicting neuropeptide cleavage sites that can be applied to all species is still lacking. In this paper, we proposed a novel deep learning method called DeepNeuropePred, using a combination of pre-trained language model and Convolutional Neural Networks for feature extraction and predicting the neuropeptide cleavage sites from precursors. To demonstrate the model's effectiveness and robustness, we evaluated the performance of DeepNeuropePred and four models from the NeuroPred server in the independent dataset and our model achieved the highest AUC score (0.916), which are 6.9%, 7.8%, 8.8%, and 10.9% higher than Mammalian (0.857), insects (0.850), Mollusc (0.842) and Motif (0.826), respectively. For the convenience of researchers, we provide a web server (http://isyslab.info/NeuroPepV2/deepNeuropePred.jsp).

2.
J Mol Biol ; 436(4): 168416, 2024 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-38143020

RESUMO

Neuropeptides not only work through nervous system but some of them also work peripherally to regulate numerous physiological processes. They are important in regulation of numerous physiological processes including growth, reproduction, social behavior, inflammation, fluid homeostasis, cardiovascular function, and energy homeostasis. The various roles of neuropeptides make them promising candidates for prospective therapeutics of different diseases. Currently, NeuroPep has been updated to version 2.0, it now holds 11,417 unique neuropeptide entries, which is nearly double of the first version of NeuroPep. When available, we collected information about the receptor for each neuropeptide entry and predicted the 3D structures of those neuropeptides without known experimental structure using AlphaFold2 or APPTEST according to the peptide sequence length. In addition, DeepNeuropePred and NeuroPred-PLM, two neuropeptide prediction tools developed by us recently, were also integrated into NeuroPep 2.0 to help to facilitate the identification of new neuropeptides. NeuroPep 2.0 is freely accessible at https://isyslab.info/NeuroPepV2/.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Neuropeptídeos , Sequência de Aminoácidos , Neuropeptídeos/química , Anotação de Sequência Molecular/métodos
3.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36892166

RESUMO

Neuropeptides are a diverse and complex class of signaling molecules that regulate a variety of biological processes. Neuropeptides provide many opportunities for the discovery of new drugs and targets for the treatment of a wide range of diseases, and thus, computational tools for the rapid and accurate large-scale identification of neuropeptides are of great significance for peptide research and drug development. Although several machine learning-based prediction tools have been developed, there is room for improvement in the performance and interpretability of the proposed methods. In this work, we developed an interpretable and robust neuropeptide prediction model, named NeuroPred-PLM. First, we employed a language model (ESM) of proteins to obtain semantic representations of neuropeptides, which could reduce the complexity of feature engineering. Next, we adopted a multi-scale convolutional neural network to enhance the local feature representation of neuropeptide embeddings. To make the model interpretable, we proposed a global multi-head attention network that could be used to capture the position-wise contribution to neuropeptide prediction via the attention scores. In addition, NeuroPred-PLM was developed based on our newly constructed NeuroPep 2.0 database. Benchmarks based on the independent test set show that NeuroPred-PLM achieves superior predictive performance compared with other state-of-the-art predictors. For the convenience of researchers, we provide an easy-to-install PyPi package (https://pypi.org/project/NeuroPredPLM/) and a web server (https://huggingface.co/spaces/isyslab/NeuroPred-PLM).


Assuntos
Neuropeptídeos , Neuropeptídeos/genética , Neuropeptídeos/química , Peptídeos , Redes Neurais de Computação , Aprendizado de Máquina , Semântica
4.
Comput Struct Biotechnol J ; 20: 1993-2000, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35521551

RESUMO

Transmembrane proteins (TMPs) are essential for cell recognition and communication, and they serve as important drug targets in humans. Transmembrane proteins' 3D structures are critical for determining their functions and drug design but are hard to determine even by experimental methods. Although some computational methods have been developed to predict transmembrane helices (TMHs) and orientation, there is still room for improvement. Considering that the pre-trained language model can make full use of massive unlabeled protein sequences to obtain latent feature representation for TMPs and reduce the dependence on evolutionary information, we proposed DeepTMpred, which used pre-trained self-supervised language models called ESM, convolutional neural networks, attentive neural network and conditional random fields for alpha-TMP topology prediction. Compared with the current state-of-the-art tools on a non-redundant dataset of TMPs, DeepTMpred demonstrated superior predictive performance in most evaluation metrics, especially at the TMH level. Furthermore, DeepTMpred could also obtain reliable prediction results for TMPs without much evolutionary feature in a few seconds. A tutorial on how to use DeepTMpred can be found in the colab notebook (https://colab.research.google.com/github/ISYSLAB-HUST/DeepTMpred/blob/master/notebook/test.ipynb).

5.
Bioinform Adv ; 2(1): vbac060, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36699417

RESUMO

Motivation: Protein domains are the basic units of proteins that can fold, function and evolve independently. Protein domain boundary partition plays an important role in protein structure prediction, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Although there are many methods that have been developed to predict domain boundaries from protein sequence over the past two decades, there is still much room for improvement. Results: In this article, a novel domain boundary prediction tool called Res-Dom was developed, which is based on a deep residual network, bidirectional long short-term memory (Bi-LSTM) and transfer learning. We used deep residual neural networks to extract higher-order residue-related information. In addition, we also used a pre-trained protein language model called ESM to extract sequence embedded features, which can summarize sequence context information more abundantly. To improve the global representation of these deep residual networks, a Bi-LSTM network was also designed to consider long-range interactions between residues. Res-Dom was then tested on an independent test set including 342 proteins and generated correct single-domain and multi-domain classifications with a Matthew's correlation coefficient of 0.668, which was 17.6% higher than the second-best compared method. For domain boundaries, the normalized domain overlapping score of Res-Dom was 0.849, which was 5% higher than the second-best compared method. Furthermore, Res-Dom required significantly less time than most of the recently developed state-of-the-art domain prediction methods. Availability and implementation: All source code, datasets and model are available at http://isyslab.info/Res-Dom/.

6.
Comput Struct Biotechnol J ; 19: 1145-1153, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33680357

RESUMO

Protein domains are the basic units of proteins that can fold, function, and evolve independently. Knowledge of protein domains is critical for protein classification, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Thus, over the past two decades, a number of protein domain identification approaches have been developed, and a variety of protein domain databases have also been constructed. This review divides protein domain prediction methods into two categories, namely sequence-based and structure-based. These methods are introduced in detail, and their advantages and limitations are compared. Furthermore, this review also provides a comprehensive overview of popular online protein domain sequence and structure databases. Finally, we discuss potential improvements of these prediction methods.

7.
Brief Bioinform ; 22(1): 194-218, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31867611

RESUMO

The recent emergence of deep learning to characterize complex patterns of protein big data reveals its potential to address the classic challenges in the field of protein data mining. Much research has revealed the promise of deep learning as a powerful tool to transform protein big data into valuable knowledge, leading to scientific discoveries and practical solutions. In this review, we summarize recent publications on deep learning predictive approaches in the field of mining protein data. The application architectures of these methods include multilayer perceptrons, stacked autoencoders, deep belief networks, two- or three-dimensional convolutional neural networks, recurrent neural networks, graph neural networks, and complex neural networks and are described from five perspectives: residue-level prediction, sequence-level prediction, three-dimensional structural analysis, interaction prediction, and mass spectrometry data mining. The advantages and deficiencies of these architectures are presented in relation to various tasks in protein data mining. Additionally, some practical issues and their future directions are discussed, such as robust deep learning for protein noisy data, architecture optimization for specific tasks, efficient deep learning for limited protein data, multimodal deep learning for heterogeneous protein data, and interpretable deep learning for protein understanding. This review provides comprehensive perspectives on general deep learning techniques for protein data analysis.


Assuntos
Mineração de Dados/métodos , Aprendizado Profundo , Análise de Sequência de Proteína/métodos , Animais , Bases de Dados de Proteínas , Humanos
8.
BMC Bioinformatics ; 21(1): 426, 2020 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-32993484

RESUMO

BACKGROUND: Structure comparison can provide useful information to identify functional and evolutionary relationship between proteins. With the dramatic increase of protein structure data in the Protein Data Bank, computation time quickly becomes the bottleneck for large scale structure comparisons. To more efficiently deal with informative multiple structure alignment tasks, we propose pmTM-align, a parallel protein structure alignment approach based on mTM-align/TM-align. pmTM-align contains two stages to handle pairwise structure alignments with Spark and the phylogenetic tree-based multiple structure alignment task on a single computer with OpenMP. RESULTS: Experiments with the SABmark dataset showed that parallelization along with data structure optimization provided considerable speedup for mTM-align. The Spark-based structure alignments achieved near ideal scalability with large datasets, and the OpenMP-based construction of the phylogenetic tree accelerated the incremental alignment of multiple structures and metrics computation by a factor of about 2-5. CONCLUSIONS: pmTM-align enables scalable pairwise and multiple structure alignment computing and offers more timely responses for medium to large-sized input data than existing alignment tools such as mTM-align.


Assuntos
Proteínas/química , Software , Algoritmos , Bases de Dados de Proteínas , Proteínas/metabolismo , Alinhamento de Sequência
9.
Sensors (Basel) ; 20(4)2020 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-32079124

RESUMO

Real-time sensing and modeling of the human body, especially the hands, is an important research endeavor for various applicative purposes such as in natural human computer interactions. Hand pose estimation is a big academic and technical challenge due to the complex structure and dexterous movement of human hands. Boosted by advancements from both hardware and artificial intelligence, various prototypes of data gloves and computer-vision-based methods have been proposed for accurate and rapid hand pose estimation in recent years. However, existing reviews either focused on data gloves or on vision methods or were even based on a particular type of camera, such as the depth camera. The purpose of this survey is to conduct a comprehensive and timely review of recent research advances in sensor-based hand pose estimation, including wearable and vision-based solutions. Hand kinematic models are firstly discussed. An in-depth review is conducted on data gloves and vision-based sensor systems with corresponding modeling methods. Particularly, this review also discusses deep-learning-based methods, which are very promising in hand pose estimation. Moreover, the advantages and drawbacks of the current hand gesture estimation methods, the applicative scope, and related challenges are also discussed.


Assuntos
Inteligência Artificial , Mãos/fisiologia , Dispositivos Eletrônicos Vestíveis , Algoritmos , Fenômenos Biomecânicos , Humanos , Interface Usuário-Computador
10.
Genome Biol ; 20(1): 229, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31676016

RESUMO

INTRODUCTION: The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and function prediction. RESULTS: By processing 1.3 TB of high-quality reads from the Tara Oceans data, we obtain 97 million non-redundant genes. Of the 5721 Pfam families that lack experimental structures, 2801 have at least one member associated with the oceanic metagenomics dataset. We apply C-QUARK, a deep-learning contact-guided ab initio structure prediction pipeline, to model 27 families, where 20 are predicted to have a reliable fold with estimated template modeling score (TM-score) at least 0.5. Detailed analyses reveal that the abundance of microbial genera in the ocean is highly correlated to the frequency of occurrence in the modeled Pfam families, suggesting the significant role of the Tara Oceans genomes in the contact-map prediction and subsequent ab initio folding simulations. Of interesting note, PF15461, which has a majority of members coming from ocean-related bacteria, is identified as an important photosynthetic protein by structure-based function annotations. The pipeline is extended to a set of 417 Pfam families, built on the combination of Tara with other metagenomics datasets, which results in 235 families with an estimated TM-score over 0.5. CONCLUSIONS: These results demonstrate a new avenue to improve the capacity of protein structure and function modeling through marine metagenomics, especially for difficult proteins with few homologous sequences.


Assuntos
Metagenômica/métodos , Modelos Químicos , Proteínas/genética , Relação Estrutura-Atividade , Organismos Aquáticos , Aprendizado Profundo , Microbiota , Família Multigênica , Proteínas/química
11.
Bioinformatics ; 35(24): 5128-5136, 2019 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-31197306

RESUMO

MOTIVATION: Accurate delineation of protein domain boundary plays an important role for protein engineering and structure prediction. Although machine-learning methods are widely used to predict domain boundary, these approaches often ignore long-range interactions among residues, which have been proven to improve the prediction performance. However, how to simultaneously model the local and global interactions to further improve domain boundary prediction is still a challenging problem. RESULTS: This article employs a hybrid deep learning method that combines convolutional neural network and gate recurrent units' models for domain boundary prediction. It not only captures the local and non-local interactions, but also fuses these features for prediction. Additionally, we adopt balanced Random Forest for classification to deal with high imbalance of samples and high dimensions of deep features. Experimental results show that our proposed approach (DNN-Dom) outperforms existing machine-learning-based methods for boundary prediction. We expect that DNN-Dom can be useful for assisting protein structure and function prediction. AVAILABILITY AND IMPLEMENTATION: The method is available as DNN-Dom Server at http://isyslab.info/DNN-Dom/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Neurais de Computação , Aprendizado Profundo , Aprendizado de Máquina , Domínios Proteicos , Proteínas , Software
12.
Sci Rep ; 8(1): 17952, 2018 12 18.
Artigo em Inglês | MEDLINE | ID: mdl-30560945

RESUMO

Chronic venous insufficiency (CVI) affect a large population, and it cannot heal without doctors' interventions. However, many patients do not get the medical advisory service in time. At the same time, the doctors also need an assistant tool to classify the patients according to the severity level of CVI. We propose an automatic classification method, named CVI-classifier to help doctors and patients. In this approach, first, low-level image features are mapped into middle-level semantic features by a concept classifier, and a multi-scale semantic model is constructed to form the image representation with rich semantics. Second, a scene classifier is trained using an optimized feature subset calculated by the high-order dependency based feature selection approach, and is used to estimate CVI's severity. At last, classification accuracy, kappa coefficient, F1-score are used to evaluate classification performance. Experiments on the CVI images from 217 patients' medical records demonstrated superior performance and efficiency for CVI-classifier, with classification accuracy up to 90.92%, kappa coefficient of 0.8735 and F1score of 0.9006. This method also outperformed doctors' diagnosis (doctors rely solely on images to make judgments) with accuracy, kappa and F1-score improved by 9.11%, 0.1250 and 0.0955 respectively.


Assuntos
Reconhecimento Automatizado de Padrão , Insuficiência Venosa/diagnóstico , Algoritmos , Doença Crônica , Humanos , Reconhecimento Automatizado de Padrão/métodos , Reconhecimento Automatizado de Padrão/normas , Curva ROC , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
14.
Nucleic Acids Res ; 45(W1): W429-W434, 2017 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-28472524

RESUMO

Molecular replacement (MR) is one of the most common techniques used for solving the phase problem in X-ray crystal diffraction. The success rate of MR however drops quickly when the sequence identity between query and templates is reduced, while the I-TASSER-MR server is designed to solve the phase problem for proteins that lack close homologous templates. Starting from a sequence, it first generates full-length models using I-TASSER by iterative structural fragment reassembly. A progressive sequence truncation procedure is then used for editing the models based on local variations of the structural assembly simulations. Next, the edited models are submitted to MR-REX to search for optimal placements in the crystal unit-cells through replica-exchange Monte Carlo simulations, with the phasing results used by CNS for final atomic model refinement and selection. The I-TASSER-MR algorithm was tested in large-scale benchmark datasets and solved 36% more targets compared to using the best threading templates. The server takes primary sequence and raw crystal diffraction data as input, with output containing annotated phase information and refined structure models. It also allows users to choose between different methods for setting B-factors and the number of models used for phasing. The online server is freely available at http://zhanglab.ccmb.med.umich.edu/I-TASSER-MR.


Assuntos
Cristalografia por Raios X , Modelos Moleculares , Conformação Proteica , Análise de Sequência de Proteína/métodos , Software , Internet
15.
Nucleic Acids Res ; 45(W1): W400-W407, 2017 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-28498994

RESUMO

We develop a hierarchical pipeline, ThreaDomEx, for both continuous domain (CD) and discontinuous domain (DCD) structure predictions. Starting from a query sequence, ThreaDomEx first threads it through the PDB to identify multiple structure templates, where a profile of domain conservation score (DC-score) is derived for domain-segment assignment. To further detect DCDs that consist of separated segments along the sequence, a boundary-clustering algorithm is used to refine the DCD-linker locations. In case that the templates do not contain DCDs, a domain-segment assembly process, guided by symmetry comparison, is applied for further DCD detections. ThreaDomEx was tested a set of 1111 proteins and achieved a normalized domain overlap score of 89.3% compared to experimental data, which is significantly higher than other state-of-the-art methods. It also recalls 26.7% of DCDs with 72.7% precision on the proteins for which threading failed to detect any DCDs. The server provides facilities for users to interactively refine the domain models by adjusting DC-score threshold, deleting and adding domain linkers, and assembling domain segments, which are particularly helpful for the hard targets for which current methods have a low accuracy while human-expert knowledge and experimental insights can be used for refining models. ThreaDomEX server is available at http://zhanglab.ccmb.med.umich.edu/ThreaDomEx.


Assuntos
Domínios Proteicos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Internet
16.
Acta Crystallogr D Struct Biol ; 72(Pt 5): 616-28, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27139625

RESUMO

Molecular replacement (MR) often requires templates with high homology to solve the phase problem in X-ray crystallography. I-TASSER-MR has been developed to test whether the success rate for structure determination of distant-homology proteins could be improved by a combination of iterative fragmental structure-assembly simulations with progressive sequence truncation designed to trim regions with high variation. The pipeline was tested on two independent protein sets consisting of 61 proteins from CASP8 and 100 high-resolution proteins from the PDB. After excluding homologous templates, I-TASSER generated full-length models with an average TM-score of 0.773, which is 12% higher than the best threading templates. Using these as search models, I-TASSER-MR found correct MR solutions for 95 of 161 targets as judged by having a TFZ of >8 or with the final structure closer to the native than the initial search models. The success rate was 16% higher than when using the best threading templates. I-TASSER-MR was also applied to 14 protein targets from structure genomics centers. Seven of these were successfully solved by I-TASSER-MR. These results confirm that advanced structure assembly and progressive structural editing can significantly improve the success rate of MR for targets with distant homology to proteins of known structure.


Assuntos
Cristalografia por Raios X/métodos , Proteínas/química , Software , Algoritmos , Animais , Bases de Dados de Proteínas , Humanos , Conformação Proteica
17.
Proteins ; 84 Suppl 1: 76-86, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-26370505

RESUMO

We tested two pipelines developed for template-free protein structure prediction in the CASP11 experiment. First, the QUARK pipeline constructs structure models by reassembling fragments of continuously distributed lengths excised from unrelated proteins. Five free-modeling (FM) targets have the model successfully constructed by QUARK with a TM-score above 0.4, including the first model of T0837-D1, which has a TM-score = 0.736 and RMSD = 2.9 Å to the native. Detailed analysis showed that the success is partly attributed to the high-resolution contact map prediction derived from fragment-based distance-profiles, which are mainly located between regular secondary structure elements and loops/turns and help guide the orientation of secondary structure assembly. In the Zhang-Server pipeline, weakly scoring threading templates are re-ordered by the structural similarity to the ab initio folding models, which are then reassembled by I-TASSER based structure assembly simulations; 60% more domains with length up to 204 residues, compared to the QUARK pipeline, were successfully modeled by the I-TASSER pipeline with a TM-score above 0.4. The robustness of the I-TASSER pipeline can stem from the composite fragment-assembly simulations that combine structures from both ab initio folding and threading template refinements. Despite the promising cases, challenges still exist in long-range beta-strand folding, domain parsing, and the uncertainty of secondary structure prediction; the latter of which was found to affect nearly all aspects of FM structure predictions, from fragment identification, target classification, structure assembly, to final model selection. Significant efforts are needed to solve these problems before real progress on FM could be made. Proteins 2016; 84(Suppl 1):76-86. © 2015 Wiley Periodicals, Inc.


Assuntos
Proteínas de Bactérias/química , Biologia Computacional/estatística & dados numéricos , Modelos Moleculares , Modelos Estatísticos , Software , Algoritmos , Sequência de Aminoácidos , Bactérias/química , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Cooperação Internacional , Dobramento de Proteína , Domínios e Motivos de Interação entre Proteínas , Estrutura Secundária de Proteína , Alinhamento de Sequência
18.
Proteins ; 84 Suppl 1: 233-46, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-26343917

RESUMO

We report the structure prediction results of a new composite pipeline for template-based modeling (TBM) in the 11th CASP experiment. Starting from multiple structure templates identified by LOMETS based meta-threading programs, the QUARK ab initio folding program is extended to generate initial full-length models under strong constraints from template alignments. The final atomic models are then constructed by I-TASSER based fragment reassembly simulations, followed by the fragment-guided molecular dynamic simulation and the MQAP-based model selection. It was found that the inclusion of QUARK-TBM simulations as an intermediate modeling step could help improve the quality of the I-TASSER models for both Easy and Hard TBM targets. Overall, the average TM-score of the first I-TASSER model is 12% higher than that of the best LOMETS templates, with the RMSD in the same threading-aligned regions reduced from 5.8 to 4.7 Å. Nevertheless, there are nearly 18% of TBM domains with the templates deteriorated by the structure assembly pipeline, which may be attributed to the errors of secondary structure and domain orientation predictions that propagate through and degrade the procedures of template identification and final model selections. To examine the record of progress, we made a retrospective report of the I-TASSER pipeline in the last five CASP experiments (CASP7-11). The data show no clear progress of the LOMETS threading programs over PSI-BLAST; but obvious progress on structural improvement relative to threading templates was witnessed in recent CASP experiments, which is probably attributed to the integration of the extended ab initio folding simulation with the threading assembly pipeline and the introduction of atomic-level structure refinements following the reduced modeling simulations. Proteins 2016; 84(Suppl 1):233-246. © 2015 Wiley Periodicals, Inc.


Assuntos
Biologia Computacional/estatística & dados numéricos , Modelos Moleculares , Modelos Estatísticos , Proteínas/química , Software , Algoritmos , Sequência de Aminoácidos , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Internet , Dobramento de Proteína , Domínios e Motivos de Interação entre Proteínas , Estrutura Secundária de Proteína , Alinhamento de Sequência , Homologia Estrutural de Proteína , Termodinâmica
19.
PLoS One ; 10(10): e0141541, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26502173

RESUMO

A variety of protein domain predictors were developed to predict protein domain boundaries in recent years, but most of them cannot predict discontinuous domains. Considering nearly 40% of multidomain proteins contain one or more discontinuous domains, we have developed DomEx to enable domain boundary predictors to detect discontinuous domains by assembling the continuous domain segments. Discontinuous domains are predicted by matching the sequence profile of concatenated continuous domain segments with the profiles from a single-domain library derived from SCOP and CATH, and Pfam. Then the matches are filtered by similarity to library templates, a symmetric index score and a profile-profile alignment score. DomEx recalled 32.3% discontinuous domains with 86.5% precision when tested on 97 non-homologous protein chains containing 58 continuous and 99 discontinuous domains, in which the predicted domain segments are within ±20 residues of the boundary definitions in CATH 3.5. Compared with our recently developed predictor, ThreaDom, which is the state-of-the-art tool to detect discontinuous-domains, DomEx recalled 26.7% discontinuous domains with 72.7% precision in a benchmark with 29 discontinuous-domain chains, where ThreaDom failed to predict any discontinuous domains. Furthermore, combined with ThreaDom, the method ranked number one among 10 predictors. The source code and datasets are available at https://github.com/xuezhidong/DomEx.


Assuntos
Proteínas/química , Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Análise de Sequência de Proteína
20.
Database (Oxford) ; 2015: bav038, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25931458

RESUMO

Neuropeptides play a variety of roles in many physiological processes and serve as potential therapeutic targets for the treatment of some nervous-system disorders. In recent years, there has been a tremendous increase in the number of identified neuropeptides. Therefore, we have developed NeuroPep, a comprehensive resource of neuropeptides, which holds 5949 non-redundant neuropeptide entries originating from 493 organisms belonging to 65 neuropeptide families. In NeuroPep, the number of neuropeptides in invertebrates and vertebrates is 3455 and 2406, respectively. It is currently the most complete neuropeptide database. We extracted entries deposited in UniProt, the database (www.neuropeptides.nl) and NeuroPedia, and used text mining methods to retrieve entries from the MEDLINE abstracts and full text articles. All the entries in NeuroPep have been manually checked. 2069 of the 5949 (35%) neuropeptide sequences were collected from the scientific literature. Moreover, NeuroPep contains detailed annotations for each entry, including source organisms, tissue specificity, families, names, post-translational modifications, 3D structures (if available) and literature references. Information derived from these peptide sequences such as amino acid compositions, isoelectric points, molecular weight and other physicochemical properties of peptides are also provided. A quick search feature allows users to search the database with keywords such as sequence, name, family, etc., and an advanced search page helps users to combine queries with logical operators like AND/OR. In addition, user-friendly web tools like browsing, sequence alignment and mapping are also integrated into the NeuroPep database. Database URL: http://isyslab.info/NeuroPep


Assuntos
Mineração de Dados , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Neuropeptídeos , Animais , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...