Pesquisa | Portal Regional da BVS (teste)

Ensemble-AHTPpred: A Robust Ensemble Machine Learning Model Integrated With a New Composite Feature for Identifying Antihypertensive Peptides.

Lertampaiporn, Supatcha; Hongsthong, Apiradee; Wattanapornprom, Warin; Thammarongtham, Chinae.

Front Genet ; 13: 883766, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35571042

RESUMO

Hypertension or elevated blood pressure is a serious medical condition that significantly increases the risks of cardiovascular disease, heart disease, diabetes, stroke, kidney disease, and other health problems, that affect people worldwide. Thus, hypertension is one of the major global causes of premature death. Regarding the prevention and treatment of hypertension with no or few side effects, antihypertensive peptides (AHTPs) obtained from natural sources might be useful as nutraceuticals. Therefore, the search for alternative/novel AHTPs in food or natural sources has received much attention, as AHTPs may be functional agents for human health. AHTPs have been observed in diverse organisms, although many of them remain underinvestigated. The identification of peptides with antihypertensive activity in the laboratory is time- and resource-consuming. Alternatively, computational methods based on robust machine learning can identify or screen potential AHTP candidates prior to experimental verification. In this paper, we propose Ensemble-AHTPpred, an ensemble machine learning algorithm composed of a random forest (RF), a support vector machine (SVM), and extreme gradient boosting (XGB), with the aim of integrating diverse heterogeneous algorithms to enhance the robustness of the final predictive model. The selected feature set includes various computed features, such as various physicochemical properties, amino acid compositions (AACs), transitions, n-grams, and secondary structure-related information; these features are able to learn more information in terms of analyzing or explaining the characteristics of the predicted peptide. In addition, the tool is integrated with a newly proposed composite feature (generated based on a logistic regression function) that combines various feature aspects to enable improved AHTP characterization. Our tool, Ensemble-AHTPpred, achieved an overall accuracy above 90% on independent test data. Additionally, the approach was applied to novel experimentally validated AHTPs, obtained from recent studies, which did not overlap with the training and test datasets, and the tool could precisely predict these AHTPs.

mSRFR: a machine learning model using microalgal signature features for ncRNA classification.

Anuntakarun, Songtham; Lertampaiporn, Supatcha; Laomettachit, Teeraphan; Wattanapornprom, Warin; Ruengjitchatchawalya, Marasri.

BioData Min ; 15(1): 8, 2022 Mar 21.

Artigo em Inglês | MEDLINE | ID: mdl-35313925

RESUMO

This work presents mSRFR (microalgae SMOTE Random Forest Relief model), a classification tool for noncoding RNAs (ncRNAs) in microalgae, including green algae, diatoms, golden algae, and cyanobacteria. First, the SMOTE technique was applied to address the challenge of imbalanced data due to the different numbers of microalgae ncRNAs from different species in the EBI RNA-central database. Then the top 20 significant features from a total of 106 features, including sequence-based, secondary structure, base-pair, and triplet sequence-structure features, were selected using the Relief feature selection method. Next, ten-fold cross-validation was applied to choose a classifier algorithm with the highest performance among Support Vector Machine, Random Forest, Decision Tree, Naïve Bayes, K-nearest Neighbor, and Neural Network, based on the receiver operating characteristic (ROC) area. The results showed that the Random Forest classifier achieved the highest ROC area of 0.992. Then, the Random Forest algorithm was selected and compared with other tools, including RNAcon, CPC, CPC2, CNCI, and CPPred. Our model achieved a high accuracy of about 97% and a low false-positive rate of about 2% in predicting the test dataset of microalgae. Furthermore, the top features from Relief revealed that the %GA dinucleotide is a signature feature of microalgal ncRNAs when compared to Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens.

Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization.

Wattanapornprom, Warin; Thammarongtham, Chinae; Hongsthong, Apiradee; Lertampaiporn, Supatcha.

Life (Basel) ; 11(4)2021 Mar 30.

Artigo em Inglês | MEDLINE | ID: mdl-33808227

RESUMO

The accurate prediction of protein localization is a critical step in any functional genome annotation process. This paper proposes an improved strategy for protein subcellular localization prediction in plants based on multiple classifiers, to improve prediction results in terms of both accuracy and reliability. The prediction of plant protein subcellular localization is challenging because the underlying problem is not only a multiclass, but also a multilabel problem. Generally, plant proteins can be found in 10-14 locations/compartments. The number of proteins in some compartments (nucleus, cytoplasm, and mitochondria) is generally much greater than that in other compartments (vacuole, peroxisome, Golgi, and cell wall). Therefore, the problem of imbalanced data usually arises. Therefore, we propose an ensemble machine learning method based on average voting among heterogeneous classifiers. We first extracted various types of features suitable for each type of protein localization to form a total of 479 feature spaces. Then, feature selection methods were used to reduce the dimensions of the features into smaller informative feature subsets. This reduced feature subset was then used to train/build three different individual models. In the process of combining the three distinct classifier models, we used an average voting approach to combine the results of these three different classifiers that we constructed to return the final probability prediction. The method could predict subcellular localizations in both single- and multilabel locations, based on the voting probability. Experimental results indicated that the proposed ensemble method could achieve correct classification with an overall accuracy of 84.58% for 11 compartments, on the basis of the testing dataset.

Ensemble-AMPPred: Robust AMP Prediction and Recognition Using the Ensemble Learning Method with a New Hybrid Feature for Differentiating AMPs.

Lertampaiporn, Supatcha; Vorapreeda, Tayvich; Hongsthong, Apiradee; Thammarongtham, Chinae.

Genes (Basel) ; 12(2)2021 01 21.

Artigo em Inglês | MEDLINE | ID: mdl-33494403

RESUMO

Antimicrobial peptides (AMPs) are natural peptides possessing antimicrobial activities. These peptides are important components of the innate immune system. They are found in various organisms. AMP screening and identification by experimental techniques are laborious and time-consuming tasks. Alternatively, computational methods based on machine learning have been developed to screen potential AMP candidates prior to experimental verification. Although various AMP prediction programs are available, there is still a need for improvement to reduce false positives (FPs) and to increase the predictive accuracy. In this work, several well-known single and ensemble machine learning approaches have been explored and evaluated based on balanced training datasets and two large testing datasets. We have demonstrated that the developed program with various predictive models has high performance in differentiating between AMPs and non-AMPs. Thus, we describe the development of a program for the prediction and recognition of AMPs using MaxProbVote, which is an ensemble model. Moreover, to increase prediction efficiency, the ensemble model was integrated with a new hybrid feature based on logistic regression. The ensemble model integrated with the hybrid feature can effectively increase the prediction sensitivity of the developed program called Ensemble-AMPPred, resulting in overall improvements in terms of both sensitivity and specificity compared to those of currently available programs.

Assuntos

Peptídeos Catiônicos Antimicrobianos/farmacologia , Bases de Dados Genéticas , Aprendizado de Máquina , Software , Algoritmos , Peptídeos Catiônicos Antimicrobianos/química , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

Spirulina-in Silico-Mutations and Their Comparative Analyses in the Metabolomics Scale by Using Proteome-Based Flux Balance Analysis.

Lertampaiporn, Supatcha; Senachak, Jittisak; Taenkaew, Wassana; Khannapho, Chiraphan; Hongsthong, Apiradee.

Cells ; 9(9)2020 09 15.

Artigo em Inglês | MEDLINE | ID: mdl-32942547

RESUMO

This study used an in silico metabolic engineering strategy for modifying the metabolic capabilities of Spirulina under specific conditions as an approach to modifying culture conditions in order to generate the intended outputs. In metabolic models, the basic metabolic fluxes in steady-state metabolic networks have generally been controlled by stoichiometric reactions; however, this approach does not consider the regulatory mechanism of the proteins responsible for the metabolic reactions. The protein regulatory network plays a critical role in the response to stresses, including environmental stress, encountered by an organism. Thus, the integration of the response mechanism of Spirulina to growth temperature stresses was investigated via simulation of a proteome-based GSMM, in which the boundaries were established by using protein expression levels obtained from quantitative proteomic analysis. The proteome-based flux balance analysis (FBA) under an optimal growth temperature (35 °C), a low growth temperature (22 °C) and a high growth temperature (40 °C) showed biomass yields that closely fit the experimental data obtained in previous research. Moreover, the response mechanism was analyzed by the integration of the proteome and protein-protein interaction (PPI) network, and those data were used to support in silico knockout/overexpression of selected proteins involved in the PPI network. The Spirulina, wild-type, proteome fluxes under different growth temperatures and those of mutants were compared, and the proteins/enzymes catalyzing the different flux levels were mapped onto their designated pathways for biological interpretation.

Assuntos

Simulação por Computador , Engenharia Metabólica/métodos , Metaboloma/genética , Metabolômica/métodos , Mutação , Proteoma/genética , Spirulina/genética , Spirulina/metabolismo , Técnicas de Introdução de Genes , Técnicas de Inativação de Genes , Redes e Vias Metabólicas/genética , Modelos Biológicos , Mapas de Interação de Proteínas/genética , Proteômica/métodos , Spirulina/crescimento & desenvolvimento , Estresse Fisiológico/genética , Temperatura

Safety Assessment of a Nham Starter Culture Lactobacillus plantarum BCC9546 via Whole-genome Analysis.

Chokesajjawatee, Nipa; Santiyanont, Pannita; Chantarasakha, Kanittha; Kocharin, Kanokarn; Thammarongtham, Chinae; Lertampaiporn, Supatcha; Vorapreeda, Tayvich; Srisuk, Tanawut; Wongsurawat, Thidathip; Jenjaroenpun, Piroon; Nookaew, Intawat; Visessanguan, Wonnop.

Sci Rep ; 10(1): 10241, 2020 06 24.

Artigo em Inglês | MEDLINE | ID: mdl-32581273

RESUMO

The safety of microbial cultures utilized for consumption is vital for public health and should be thoroughly assessed. Although general aspects on the safety assessment of microbial cultures have been suggested, no methodological detail nor procedural guideline have been published. Herein, we propose a detailed protocol on microbial strain safety assessment via whole-genome sequence analysis. A starter culture employed in traditional fermented pork production, nham, namely Lactobacillus plantarum BCC9546, was used as an example. The strain's whole-genome was sequenced through several next-generation sequencing techniques. Incomplete plasmid information from the PacBio sequencing platform and shorter chromosome size from the hybrid Oxford Nanopore-Illumina platform were noted. The methods for 1) unambiguous species identification using 16S rRNA gene and average nucleotide identity, 2) determination of virulence factors and undesirable genes, 3) determination of antimicrobial resistance properties and their possibility of transfer, and 4) determination of antimicrobial drug production capability of the strain were provided in detail. Applicability of the search tools and limitations of databases were discussed. Finally, a procedural guideline for the safety assessment of microbial strains via whole-genome analysis was proposed.

Assuntos

Alimentos Fermentados/microbiologia , Lactobacillus plantarum/classificação , Lactobacillus plantarum/crescimento & desenvolvimento , Sequenciamento Completo do Genoma/métodos , Técnicas Bacteriológicas , Inocuidade dos Alimentos , Tamanho do Genoma , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala , Lactobacillus plantarum/genética , Plasmídeos/genética , RNA Ribossômico 16S/genética

PSO-LocBact: A Consensus Method for Optimizing Multiple Classifier Results for Predicting the Subcellular Localization of Bacterial Proteins.

Lertampaiporn, Supatcha; Nuannimnoi, Sirapop; Vorapreeda, Tayvich; Chokesajjawatee, Nipa; Visessanguan, Wonnop; Thammarongtham, Chinae.

Biomed Res Int ; 2019: 5617153, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31886228

RESUMO

Several computational approaches for predicting subcellular localization have been developed and proposed. These approaches provide diverse performance because of their different combinations of protein features, training datasets, training strategies, and computational machine learning algorithms. In some cases, these tools may yield inconsistent and conflicting prediction results. It is important to consider such conflicting or contradictory predictions from multiple prediction programs during protein annotation, especially in the case of a multiclass classification problem such as subcellular localization. Hence, to address this issue, this work proposes the use of the particle swarm optimization (PSO) algorithm to combine the prediction outputs from multiple different subcellular localization predictors with the aim of integrating diverse prediction models to enhance the final predictions. Herein, we present PSO-LocBact, a consensus classifier based on PSO that can be used to combine the strengths of several preexisting protein localization predictors specially designed for bacteria. Our experimental results indicate that the proposed method can resolve inconsistency problems in subcellular localization prediction for both Gram-negative and Gram-positive bacterial proteins. The average accuracy achieved on each test dataset is over 98%, higher than that achieved with any individual predictor.

Assuntos

Proteínas de Bactérias/classificação , Biologia Computacional/métodos , Espaço Intracelular/química , Aprendizado de Máquina , Análise de Sequência de Proteína/métodos , Algoritmos , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Consenso

Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm.

Lertampaiporn, Supatcha; Thammarongtham, Chinae; Nukoolkit, Chakarida; Kaewkamnerdpong, Boonserm; Ruengjitchatchawalya, Marasri.

Nucleic Acids Res ; 42(11): e93, 2014 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-24771344

RESUMO

To identify non-coding RNA (ncRNA) signals within genomic regions, a classification tool was developed based on a hybrid random forest (RF) with a logistic regression model to efficiently discriminate short ncRNA sequences as well as long complex ncRNA sequences. This RF-based classifier was trained on a well-balanced dataset with a discriminative set of features and achieved an accuracy, sensitivity and specificity of 92.11%, 90.7% and 93.5%, respectively. The selected feature set includes a new proposed feature, SCORE. This feature is generated based on a logistic regression function that combines five significant features-structure, sequence, modularity, structural robustness and coding potential-to enable improved characterization of long ncRNA (lncRNA) elements. The use of SCORE improved the performance of the RF-based classifier in the identification of Rfam lncRNA families. A genome-wide ncRNA classification framework was applied to a wide variety of organisms, with an emphasis on those of economic, social, public health, environmental and agricultural significance, such as various bacteria genomes, the Arthrospira (Spirulina) genome, and rice and human genomic regions. Our framework was able to identify known ncRNAs with sensitivities of greater than 90% and 77.7% for prokaryotic and eukaryotic sequences, respectively. Our classifier is available at http://ncrna-pred.com/HLRF.htm.

Assuntos

Algoritmos , RNA Longo não Codificante/genética , Pequeno RNA não Traduzido/genética , Classificação/métodos , Genoma Bacteriano , Genômica , Humanos , Modelos Logísticos , RNA não Traduzido/classificação , RNA não Traduzido/genética

Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification.

Lertampaiporn, Supatcha; Thammarongtham, Chinae; Nukoolkit, Chakarida; Kaewkamnerdpong, Boonserm; Ruengjitchatchawalya, Marasri.

Nucleic Acids Res ; 41(1): e21, 2013 Jan 07.

Artigo em Inglês | MEDLINE | ID: mdl-23012261

RESUMO

An ensemble classifier approach for microRNA precursor (pre-miRNA) classification was proposed based upon combining a set of heterogeneous algorithms including support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF), then aggregating their prediction through a voting system. Additionally, the proposed algorithm, the classification performance was also improved using discriminative features, self-containment and its derivatives, which have shown unique structural robustness characteristics of pre-miRNAs. These are applicable across different species. By applying preprocessing methods--both a correlation-based feature selection (CFS) with genetic algorithm (GA) search method and a modified-Synthetic Minority Oversampling Technique (SMOTE) bagging rebalancing method--improvement in the performance of this ensemble was observed. The overall prediction accuracies obtained via 10 runs of 5-fold cross validation (CV) was 96.54%, with sensitivity of 94.8% and specificity of 98.3%-this is better in trade-off sensitivity and specificity values than those of other state-of-the-art methods. The ensemble model was applied to animal, plant and virus pre-miRNA and achieved high accuracy, >93%. Exploiting the discriminative set of selected features also suggests that pre-miRNAs possess high intrinsic structural robustness as compared with other stem loops. Our heterogeneous ensemble method gave a relatively more reliable prediction than those using single classifiers. Our program is available at http://ncrna-pred.com/premiRNA.html.

Assuntos

Algoritmos , MicroRNAs/classificação , Precursores de RNA/classificação , Pareamento de Bases , Humanos , MicroRNAs/química , Precursores de RNA/química , RNA de Plantas/química , RNA de Plantas/classificação , RNA Viral/química , RNA Viral/classificação , Sensibilidade e Especificidade

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA