Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 21(Suppl 2): 80, 2020 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-32164574

RESUMO

BACKGROUND: Interactions between proteins and non-proteic small molecule ligands play important roles in the biological processes of living systems. Thus, the development of computational methods to support our understanding of the ligand-receptor recognition process is of fundamental importance since these methods are a major step towards ligand prediction, target identification, lead discovery, and more. This article presents visGReMLIN, a web server that couples a graph mining-based strategy to detect motifs at the protein-ligand interface with an interactive platform to visually explore and interpret these motifs in the context of protein-ligand interfaces. RESULTS: To illustrate the potential of visGReMLIN, we conducted two cases in which our strategy was compared with previous experimentally and computationally determined results. visGReMLIN allowed us to detect patterns previously documented in the literature in a totally visual manner. In addition, we found some motifs that we believe are relevant to protein-ligand interactions in the analyzed datasets. CONCLUSIONS: We aimed to build a visual analytics-oriented web server to detect and visualize common motifs at the protein-ligand interface. visGReMLIN motifs can support users in gaining insights on the key atoms/residues responsible for protein-ligand interactions in a dataset of complexes.


Assuntos
Ligantes , Proteínas/metabolismo , Interface Usuário-Computador , Humanos , Ligação de Hidrogênio , Interações Hidrofóbicas e Hidrofílicas , Ligação Proteica , Proteínas/química
2.
BMC Bioinformatics ; 18(1): 431, 2017 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-28964254

RESUMO

BACKGROUND: Geminiviruses infect a broad range of cultivated and non-cultivated plants, causing significant economic losses worldwide. The studies of the diversity of species, taxonomy, mechanisms of evolution, geographic distribution, and mechanisms of interaction of these pathogens with the host have greatly increased in recent years. Furthermore, the use of rolling circle amplification (RCA) and advanced metagenomics approaches have enabled the elucidation of viromes and the identification of many viral agents in a large number of plant species. As a result, determining the nomenclature and taxonomically classifying geminiviruses turned into complex tasks. In addition, the gene responsible for viral replication (particularly, the viruses belonging to the genus Mastrevirus) may be spliced due to the use of the transcriptional/splicing machinery in the host cells. However, the current tools have limitations concerning the identification of introns. RESULTS: This study proposes a new method, designated Fangorn Forest (F2), based on machine learning approaches to classify genera using an ab initio approach, i.e., using only the genomic sequence, as well as to predict and classify genes in the family Geminiviridae. In this investigation, nine genera of the family Geminiviridae and their related satellite DNAs were selected. We obtained two training sets, one for genus classification, containing attributes extracted from the complete genome of geminiviruses, while the other was made up to classify geminivirus genes, containing attributes extracted from ORFs taken from the complete genomes cited above. Three ML algorithms were applied on those datasets to build the predictive models: support vector machines, using the sequential minimal optimization training approach, random forest (RF), and multilayer perceptron. RF demonstrated a very high predictive power, achieving 0.966, 0.964, and 0.995 of precision, recall, and area under the curve (AUC), respectively, for genus classification. For gene classification, RF could reach 0.983, 0.983, and 0.998 of precision, recall, and AUC, respectively. CONCLUSIONS: Therefore, Fangorn Forest is proven to be an efficient method for classifying genera of the family Geminiviridae with high precision and effective gene prediction and classification. The method is freely accessible at www.geminivirus.org:8080/geminivirusdw/discoveryGeminivirus.jsp .


Assuntos
Geminiviridae/genética , Aprendizado de Máquina , Área Sob a Curva , DNA Satélite/classificação , DNA Satélite/genética , Geminiviridae/classificação , Internet , Fases de Leitura Aberta/genética , Plantas/virologia , Curva ROC , Interface Usuário-Computador
3.
BMC Bioinformatics ; 18(1): 240, 2017 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-28476106

RESUMO

BACKGROUND: The Geminiviridae family encompasses a group of single-stranded DNA viruses with twinned and quasi-isometric virions, which infect a wide range of dicotyledonous and monocotyledonous plants and are responsible for significant economic losses worldwide. Geminiviruses are divided into nine genera, according to their insect vector, host range, genome organization, and phylogeny reconstruction. Using rolling-circle amplification approaches along with high-throughput sequencing technologies, thousands of full-length geminivirus and satellite genome sequences were amplified and have become available in public databases. As a consequence, many important challenges have emerged, namely, how to classify, store, and analyze massive datasets as well as how to extract information or new knowledge. Data mining approaches, mainly supported by machine learning (ML) techniques, are a natural means for high-throughput data analysis in the context of genomics, transcriptomics, proteomics, and metabolomics. RESULTS: Here, we describe the development of a data warehouse enriched with ML approaches, designated geminivirus.org. We implemented search modules, bioinformatics tools, and ML methods to retrieve high precision information, demarcate species, and create classifiers for genera and open reading frames (ORFs) of geminivirus genomes. CONCLUSIONS: The use of data mining techniques such as ETL (Extract, Transform, Load) to feed our database, as well as algorithms based on machine learning for knowledge extraction, allowed us to obtain a database with quality data and suitable tools for bioinformatics analysis. The Geminivirus Data Warehouse (geminivirus.org) offers a simple and user-friendly environment for information retrieval and knowledge discovery related to geminiviruses.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Geminiviridae/genética , Aprendizado de Máquina , Algoritmos , DNA de Cadeia Simples/genética , DNA Viral/genética , Fases de Leitura Aberta/genética , Filogenia , Plantas/virologia
4.
BMC Genomics ; 13 Suppl 5: S4, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23095859

RESUMO

BACKGROUND: The shotgun strategy (liquid chromatography coupled with tandem mass spectrometry) is widely applied for identification of proteins in complex mixtures. This method gives rise to thousands of spectra in a single run, which are interpreted by computational tools. Such tools normally use a protein database from which peptide sequences are extracted for matching with experimentally derived mass spectral data. After the database search, the correctness of obtained peptide-spectrum matches (PSMs) needs to be evaluated also by algorithms, as a manual curation of these huge datasets would be impractical. The target-decoy database strategy is largely used to perform spectrum evaluation. Nonetheless, this method has been applied without considering sensitivity, i.e., only error estimation is taken into account. A recently proposed method termed MUDE treats the target-decoy analysis as an optimization problem, where sensitivity is maximized. This method demonstrates a significant increase in the retrieved number of PSMs for a fixed error rate. However, the MUDE model is constructed in such a way that linear decision boundaries are established to separate correct from incorrect PSMs. Besides, the described heuristic for solving the optimization problem has to be executed many times to achieve a significant augmentation in sensitivity. RESULTS: Here, we propose a new method, termed MUMAL, for PSM assessment that is based on machine learning techniques. Our method can establish nonlinear decision boundaries, leading to a higher chance to retrieve more true positives. Furthermore, we need few iterations to achieve high sensitivities, strikingly shortening the running time of the whole process. Experiments show that our method achieves a considerably higher number of PSMs compared with standard tools such as MUDE, PeptideProphet, and typical target-decoy approaches. CONCLUSION: Our approach not only enhances the computational performance, and thus the turn around time of MS-based experiments in proteomics, but also improves the information content with benefits of a higher proteome coverage. This improvement, for instance, increases the chance to identify important drug targets or biomarkers for drug development or molecular diagnostics.


Assuntos
Algoritmos , Inteligência Artificial , Cromatografia Líquida/métodos , Biologia Computacional/métodos , Proteínas/análise , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Análise Multivariada , Redes Neurais de Computação , Sensibilidade e Especificidade
5.
J Proteome Res ; 9(5): 2265-77, 2010 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-20199108

RESUMO

The target-decoy search strategy has been successfully applied in shotgun proteomics for validating peptide and protein identifications. If, on one hand, this method has proven to be very efficient for error estimation, on the other hand, little attention has been paid to the resulting sensitivity. Only two scores are normally used and thresholds are explored in a very simplistic way. In this work, a multivariate decoy analysis is described, where many quality parameters are considered. This analysis is treated in our approach as an optimization problem for sensitivity maximization. Furthermore, an efficient heuristic is proposed to solve this problem. Experiments comparing our method, termed MUDE (multivariate decoy database analysis), with traditional bivariate decoy analysis and with Peptide/ProteinProphet showed that our procedure significantly enhances the retrieved number of identifications when comparing the same false discovery rates. Particularly for phosphopeptide/protein identifications, we could demonstrate more than a two-fold increase in sensitivity compared with the Trans-Proteomic Pipeline tools.


Assuntos
Mineração de Dados/métodos , Mapeamento de Peptídeos/métodos , Peptídeos/química , Proteômica/métodos , Algoritmos , Animais , Bases de Dados de Proteínas , Humanos , Modelos Lineares , Análise Multivariada , Fosfoproteínas/química , Proteínas/química , Sensibilidade e Especificidade , Espectrometria de Massas em Tandem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...