Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Chem Inf Model ; 61(7): 3213-3231, 2021 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-34191520

RESUMO

In silico prediction of antileishmanial activity using quantitative structure-activity relationship (QSAR) models has been developed on limited and small datasets. Nowadays, the availability of large and diverse high-throughput screening data provides an opportunity to the scientific community to model this activity from the chemical structure. In this study, we present the first KNIME automated workflow to modeling a large, diverse, and highly imbalanced dataset of compounds with antileishmanial activity. Because the data is strongly biased toward inactive compounds, a novel strategy was implemented based on the selection of different balanced training sets and a further consensus model using single decision trees as the base model and three criteria for output combinations. The decision tree consensus was adopted after comparing its classification performance to consensuses built upon Gaussian-Naïve-Bayes, Support-Vector-Machine, Random-Forest, Gradient-Boost, and Multi-Layer-Perceptron base models. All these consensuses were rigorously validated using internal and external test validation sets and were compared against each other using Friedman and Bonferroni-Dunn statistics. For the retained decision tree-based consensus model, which covers 100% of the chemical space of the dataset and with the lowest consensus level, the overall accuracy statistics for test and external sets were between 71 and 74% and 71 and 76%, respectively, while for a reduced chemical space (21%) and with an incremental consensus level, the accuracy statistics were substantially improved with values for the test and external sets between 86 and 92% and 88 and 92%, respectively. These results highlight the relevance of the consensus model to prioritize a relatively small set of active compounds with high prediction sensitivity using the Incremental Consensus at high level values or to predict as many compounds as possible, lowering the level of Incremental Consensus. Finally, the workflow developed eliminates human bias, improves the procedure reproducibility, and allows other researchers to reproduce our design and use it in their own QSAR problems.


Assuntos
Leishmania , Relação Quantitativa Estrutura-Atividade , Teorema de Bayes , Ensaios de Triagem em Larga Escala , Humanos , Reprodutibilidade dos Testes
2.
Curr Top Med Chem ; 21(7): 599-611, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33441066

RESUMO

BACKGROUND: Molecular phylogenetic algorithms frequently disagree with the approaches considering reproductive compatibility and morphological criteria for species delimitation. The question stems if the resulting species boundaries from molecular, reproductive and/or morphological data are definitively not reconcilable; or if the existing phylogenetic methods are not sensitive enough to agree morphological and genetic variation in species delimitation. OBJECTIVE: We propose DISTATIS as an integrative framework to combine alignment-based (AB) and alignment-free (AF) distance matrices from ITS2 sequences/structures to shed light whether Gelasinospora and Neurospora are sister but independent genera. METHODS: We aimed at addressing this standing issue by harmonizing genus-specific classification based on their ascospore morphology and ITS2 molecular data. To validate our proposal, three phylogenetic approaches: i) traditional alignment-based, ii) alignment-free and iii) novel distance integrative (DI)-based were comparatively evaluated on a set of Gelasinospora and Neurospora species. All considered species have been extensively characterized at both the morphological and reproductive levels and there are known incongruences between their ascospore morphology and molecular data that hampers genus-specific delimitation. RESULTS: Traditional AB phylogenetic analyses fail at resolving the Gelasinospora and Neurospora genera into independent monophyletic clades following ascospore morphology criteria. In contrast, AF and DI approaches produced phylogenetic trees that could properly delimit the expected monophyletic clades. CONCLUSION: The DI approach outperformed the AF one in the sense that it could also divide the Neurospora species according to their reproduction mode.


Assuntos
Neurospora/classificação , Filogenia , Sordariales/classificação , Algoritmos
3.
Biomolecules ; 10(1)2019 12 23.
Artigo em Inglês | MEDLINE | ID: mdl-31878100

RESUMO

Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical-numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.


Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Análise de Sequência de Proteína , Homologia de Sequência , Sequência de Aminoácidos
4.
Genomics ; 111(6): 1720-1727, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-30508561

RESUMO

The Harderian gland is a cephalic structure, widely distributed among vertebrates. In snakes, the Harderian gland is anatomically connected to the vomeronasal organ via the nasolacrimal duct, and in some species can be larger than the eyes. The function of the Harderian gland remains elusive, but it has been proposed to play a role in the production of saliva, pheromones, thermoregulatory lipids and growth factors, among others. Here, we have profiled the transcriptomes of the Harderian glands of three non-front-fanged colubroid snakes from Cuba: Caraiba andreae (Cuban Lesser Racer); Cubophis cantherigerus (Cuban Racer); and Tretanorhinus variabilis (Caribbean Water Snake), using Illumina HiSeq2000 100 bp paired-end. In addition to ribosomal and non-characterized proteins, the most abundant transcripts encode putative transport/binding, lipocalin/lipocalin-like, and bactericidal/permeability-increasing-like proteins. Transcripts coding for putative canonical toxins described in venomous snakes were also identified. This transcriptional profile suggests a more complex function than previously recognized for this enigmatic organ.


Assuntos
Colubridae/metabolismo , Regulação da Expressão Gênica/fisiologia , Glândula de Harder/metabolismo , Proteínas de Répteis/biossíntese , Venenos de Serpentes/biossíntese , Transcriptoma/fisiologia , Animais , Colubridae/genética , Cuba , Proteínas de Répteis/genética , Venenos de Serpentes/genética
5.
BMC Bioinformatics ; 19(1): 166, 2018 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-29724166

RESUMO

BACKGROUND: The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. RESULTS: The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free features related to amino acid composition. CONCLUSIONS: The incorporation of alignment-free features in supervised big data models did not significantly improve ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection methods encourages the evaluation of other alignment-free protein pair descriptors in future research.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Árvores de Decisões , Proteoma , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Análise de Sequência de Proteína/métodos
6.
PLoS One ; 8(7): e65926, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23874386

RESUMO

The introduction of two-dimension (2D) graphs and their numerical characterization for comparative analyses of DNA/RNA and protein sequences without the need of sequence alignments is an active yet recent research topic in bioinformatics. Here, we used a 2D artificial representation (four-color maps) with a simple numerical characterization through topological indices (TIs) to aid the discovering of remote homologous of Adenylation domains (A-domains) from the Nonribosomal Peptide Synthetases (NRPS) class in the proteome of the cyanobacteria Microcystis aeruginosa. Cyanobacteria are a rich source of structurally diverse oligopeptides that are predominantly synthesized by NPRS. Several A-domains share amino acid identities lower than 20 % being a possible source of remote homologous. Therefore, A-domains cannot be easily retrieved by BLASTp searches using a single template. To cope with the sequence diversity of the A-domains we have combined homology-search methods with an alignment-free tool that uses protein four-color-maps. TI2BioP (Topological Indices to BioPolymers) version 2.0, available at http://ti2biop.sourceforge.net/ allowed the calculation of simple TIs from the protein sequences (four-color maps). Such TIs were used as input predictors for the statistical estimations required to build the alignment-free models. We concluded that the use of graphical/numerical approaches in cooperation with other sequence search methods, like multi-templates BLASTp and profile HMM, can give the most complete exploration of the repertoire of highly diverse protein families.


Assuntos
Biologia Computacional/métodos , Peptídeo Sintases/química , Algoritmos , Estrutura Terciária de Proteína
7.
PLoS One ; 6(10): e26638, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22046320

RESUMO

The ITS2 gene class shows a high sequence divergence among its members that have complicated its annotation and its use for reconstructing phylogenies at a higher taxonomical level (beyond species and genus). Several alignment strategies have been implemented to improve the ITS2 annotation quality and its use for phylogenetic inferences. Although, alignment based methods have been exploited to the top of its complexity to tackle both issues, no alignment-free approaches have been able to successfully address both topics. By contrast, the use of simple alignment-free classifiers, like the topological indices (TIs) containing information about the sequence and structure of ITS2, may reveal to be a useful approach for the gene prediction and for assessing the phylogenetic relationships of the ITS2 class in eukaryotes. Thus, we used the TI2BioP (Topological Indices to BioPolymers) methodology [1], [2], freely available at http://ti2biop.sourceforge.net/ to calculate two different TIs. One class was derived from the ITS2 artificial 2D structures generated from DNA strings and the other from the secondary structure inferred from RNA folding algorithms. Two alignment-free models based on Artificial Neural Networks were developed for the ITS2 class prediction using the two classes of TIs referred above. Both models showed similar performances on the training and the test sets reaching values above 95% in the overall classification. Due to the importance of the ITS2 region for fungi identification, a novel ITS2 genomic sequence was isolated from Petrakia sp. This sequence and the test set were used to comparatively evaluate the conventional classification models based on multiple sequence alignments like Hidden Markov based approaches, revealing the success of our models to identify novel ITS2 members. The isolated sequence was assessed using traditional and alignment-free based techniques applied to phylogenetic inference to complement the taxonomy of the Petrakia sp. fungal isolate.


Assuntos
DNA Espaçador Ribossômico , Eucariotos/genética , Anotação de Sequência Molecular , Filogenia , Algoritmos , Métodos , Redes Neurais de Computação , Conformação de Ácido Nucleico , Dobramento de RNA
8.
J Theor Biol ; 273(1): 167-78, 2011 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-21192951

RESUMO

Alignment-free classifiers are especially useful in the functional classification of protein classes with variable homology and different domain structures. Thus, the Topological Indices to BioPolymers (TI2BioP) methodology (Agüero-Chapin et al., 2010) inspired in both the TOPS-MODE and the MARCH-INSIDE methodologies allows the calculation of simple topological indices (TIs) as alignment-free classifiers. These indices were derived from the clustering of the amino acids into four classes of hydrophobicity and polarity revealing higher sequence-order information beyond the amino acid composition level. The predictability power of such TIs was evaluated for the first time on the RNase III family, due to the high diversity of its members (primary sequence and domain organization). Three non-linear models were developed for RNase III class prediction: Decision Tree Model (DTM), Artificial Neural Networks (ANN)-model and Hidden Markov Model (HMM). The first two are alignment-free approaches, using TIs as input predictors. Their performances were compared with a non-classical HMM, modified according to our amino acid clustering strategy. The alignment-free models showed similar performances on the training and the test sets reaching values above 90% in the overall classification. The non-classical HMM showed the highest rate in the classification with values above 95% in training and 100% in test. Although the higher accuracy of the HMM, the DTM showed simplicity for the RNase III classification with low computational cost. Such simplicity was evaluated in respect to HMM and ANN models for the functional annotation of a new bacterial RNase III class member, isolated and annotated by our group.


Assuntos
Dinâmica não Linear , Ribonuclease III/química , Sequência de Aminoácidos , Árvores de Decisões , Ensaios Enzimáticos , Escherichia coli/enzimologia , Cadeias de Markov , Dados de Sequência Molecular , Redes Neurais de Computação , Conformação Proteica , Curva ROC , Proteínas Recombinantes/química , Proteínas Recombinantes/metabolismo , Reprodutibilidade dos Testes , Ribonuclease III/isolamento & purificação , Alinhamento de Sequência
9.
Amino Acids ; 40(2): 431-42, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20563611

RESUMO

Bacteriocins are proteinaceous toxins produced and exported by both gram-negative and gram-positive bacteria as a defense mechanism. The bacteriocin protein family is highly diverse, which complicates the identification of bacteriocin-like sequences using alignment approaches. The use of topological indices (TIs) irrespective of sequence similarity can be a promising alternative to predict proteinaceous bacteriocins. Thus, we present Topological Indices to BioPolymers (TI2BioP) as an alignment-free approach inspired in both the Topological Substructural Molecular Design (TOPS-MODE) and Markov Chain Invariants for Network Selection and Design (MARCH-INSIDE) methodology. TI2BioP allows the calculation of the spectral moments as simple TIs to seek quantitative sequence-function relationships (QSFR) models. Since hydrophobicity and basicity are major criteria for the bactericide activity of bacteriocins, the spectral moments ((HP)µ(k)) were derived for the first time from protein artificial secondary structures based on amino acid clustering into a Cartesian system of hydrophobicity and polarity. Several orders of (HP)µ(k) characterized numerically 196 bacteriocin-like sequences and a control group made up of 200 representative CATH domains. Subsequently, they were used to develop an alignment-free QSFR model allowing a 76.92% discrimination of bacteriocin proteins from other domains, a relevant result considering the high sequence diversity among the members of both groups. The model showed a prediction overall performance of 72.16%, detecting specifically 66.7% of proteinaceous bacteriocins whereas the InterProScan retrieved just 60.2%. As a practical validation, the model also predicted successfully the cryptic bactericide function of the Cry 1Ab C-terminal domain from Bacillus thuringiensis's endotoxin, which has not been detected by classical alignment methods.


Assuntos
Bacteriocinas/química , Biopolímeros/química , Sequência de Aminoácidos , Biologia Computacional , Interações Hidrofóbicas e Hidrofílicas , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Alinhamento de Sequência
10.
Bioorg Med Chem ; 17(2): 537-47, 2009 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-19114309

RESUMO

Lately, Quantitative Structure-Activity Relationship (QSAR) studies have been afar used to predict anticancer activity taking into account different molecular descriptors, statistical techniques, cell lines and data set of congeneric and non-congeneric compounds. Herein we report a QSAR study based on a TOPological Sub-structural Molecular Design (TOPS-MODE) approach, aiming at predicting the anticancer leukemia activity of a diverse data set of indolocarbazoles derivatives. Finally, several aspects of the structural activity relationships are discussed in terms of the contribution of different bonds to the anticancer activity, thereby making the relationship between structure and biological activity more transparent.


Assuntos
Antineoplásicos/síntese química , Modelos Moleculares , Relação Quantitativa Estrutura-Atividade , Animais , Antineoplásicos/farmacologia , Carbazóis , Linhagem Celular Tumoral , Proliferação de Células/efeitos dos fármacos , Camundongos
11.
J Chem Inf Comput Sci ; 43(4): 1192-9, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-12870911

RESUMO

A new application of TOPological Sub-structural MOlecular DEsign (TOPS-MODE) was carried out in herbicides using computer-aided molecular design. Two series of compounds, one containing herbicide and the other containing nonherbicide compounds, were processed by a k-Means Cluster Analysis in order to design the training and prediction sets. A linear classification function to discriminate the herbicides from the nonherbicide compounds was developed. The model correctly and clearly classified 88% of active and 94% of inactive compounds in the training set. More specifically, the model showed a good global classification of 91%, i.e., (168 cases out of 185). While in the prediction set, they showed an overall predictability of 91% and 92% for active and inactive compounds, being the global percentage of good classification of 92%. To assess the range of model applicability, a virtual screening of structurally heterogeneous series of herbicidal compounds was carried out. Two hundred eighty-four out of 332 were correctly classified (86%). Furthermore this paper describes a fragment analysis in order to determine the contribution of several fragments toward herbicidal property; also the present of halogens in the selected fragments were analyzed. It seems that the present TOPS-MODE based QSAR is the first alternate general "in silico" technique to experimentation in herbicides discovery.


Assuntos
Desenho Assistido por Computador , Desenho de Fármacos , Herbicidas/química , Análise por Conglomerados , Bases de Dados Factuais , Modelos Químicos , Compostos Orgânicos/química , Compostos Orgânicos/classificação , Compostos Orgânicos/farmacologia , Relação Quantitativa Estrutura-Atividade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...