Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Comput Struct Biotechnol J ; 23: 1864-1876, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38707536

RESUMO

In current genomic research, the widely used methods for predicting antimicrobial resistance (AMR) often rely on prior knowledge of known AMR genes or reference genomes. However, these methods have limitations, potentially resulting in imprecise predictions owing to incomplete coverage of AMR mechanisms and genetic variations. To overcome these limitations, we propose a pan-genome-based machine learning approach to advance our understanding of AMR gene repertoires and uncover possible feature sets for precise AMR classification. By building compacted de Brujin graphs (cDBGs) from thousands of genomes and collecting the presence/absence patterns of unique sequences (unitigs) for Pseudomonas aeruginosa, we determined that using machine learning models on unitig-centered pan-genomes showed significant promise for accurately predicting the antibiotic resistance or susceptibility of microbial strains. Applying a feature-selection-based machine learning algorithm led to satisfactory predictive performance for the training dataset (with an area under the receiver operating characteristic curve (AUC) of > 0.929) and an independent validation dataset (AUC, approximately 0.77). Furthermore, the selected unitigs revealed previously unidentified resistance genes, allowing for the expansion of the resistance gene repertoire to those that have not previously been described in the literature on antibiotic resistance. These results demonstrate that our proposed unitig-based pan-genome feature set was effective in constructing machine learning predictors that could accurately identify AMR pathogens. Gene sets extracted using this approach may offer valuable insights into expanding known AMR genes and forming new hypotheses to uncover the underlying mechanisms of bacterial AMR.

2.
Diagnostics (Basel) ; 14(7)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38611635

RESUMO

Asthma is a diverse disease that affects over 300 million individuals globally. The prevalence of asthma has increased by 50% every decade since the 1960s, making it a serious global health issue. In addition to its associated high mortality, asthma generates large economic losses due to the degradation of patients' quality of life and the impairment of their physical fitness. Asthma research has evolved in recent years to fully analyze why certain diseases develop based on a variety of data and observations of patients' performance. The advent of new techniques offers good opportunities and application prospects for the development of asthma diagnosis methods. Over the last few decades, techniques like data mining and machine learning have been utilized to diagnose asthma. Nevertheless, these traditional methods are unable to address all of the difficulties associated with improving a small dataset to increase its quantity, quality, and feature space complexity at the same time. In this study, we propose a sustainable approach to asthma diagnosis using advanced machine learning techniques. To be more specific, we use feature selection to find the most important features, data augmentation to improve the dataset's resilience, and the extreme gradient boosting algorithm for classification. Data augmentation in the proposed method involves generating synthetic samples to increase the size of the training dataset, which is then utilized to enhance the training data initially. This could lessen the phenomenon of imbalanced data related to asthma. Then, to improve diagnosis accuracy and prioritize significant features, the extreme gradient boosting technique is used. The outcomes indicate that the proposed approach performs better in terms of diagnostic accuracy than current techniques. Furthermore, five essential features are extracted to help physicians diagnose asthma.

3.
Front Genet ; 14: 1054032, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37323667

RESUMO

Background: Predicting the resistance profiles of antimicrobial resistance (AMR) pathogens is becoming more and more important in treating infectious diseases. Various attempts have been made to build machine learning models to classify resistant or susceptible pathogens based on either known antimicrobial resistance genes or the entire gene set. However, the phenotypic annotations are translated from minimum inhibitory concentration (MIC), which is the lowest concentration of antibiotic drugs in inhibiting certain pathogenic strains. Since the MIC breakpoints that classify a strain to be resistant or susceptible to specific antibiotic drug may be revised by governing institutes, we refrained from translating these MIC values into the categories "susceptible" or "resistant" but instead attempted to predict the MIC values using machine learning approaches. Results: By applying a machine learning feature selection approach on a Salmonella enterica pan-genome, in which the protein sequences were clustered to identify highly similar gene families, we showed that the selected features (genes) performed better than known AMR genes, and that models built on the selected genes achieved very accurate MIC prediction. Functional analysis revealed that about half of the selected genes were annotated as hypothetical proteins (i.e., with unknown functional roles), and that only a small portion of known AMR genes were among the selected genes, indicating that applying feature selection on the entire gene set has the potential of uncovering novel genes that may be associated with and may contribute to pathogenic antimicrobial resistances. Conclusion: The application of the pan-genome-based machine learning approach was indeed capable of predicting MIC values with very high accuracy. The feature selection process may also identify novel AMR genes for inferring bacterial antimicrobial resistance phenotypes.

4.
Comput Struct Biotechnol J ; 21: 769-779, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36698972

RESUMO

Understanding genes and their underlying mechanisms is critical in deciphering how antimicrobial-resistant (AMR) bacteria withstand detrimental effects of antibiotic drugs. At the same time the genes related to AMR phenotypes may also serve as biomarkers for predicting whether a microbial strain is resistant to certain antibiotic drugs. We developed a Cross-Validated Feature Selection (CVFS) approach for robustly selecting the most parsimonious gene sets for predicting AMR activities from bacterial pan-genomes. The core idea behind the CVFS approach is interrogating features among non-overlapping sub-parts of the datasets to ensure the representativeness of the features. By randomly splitting the dataset into disjoint sub-parts, conducting feature selection within each sub-part, and intersecting the features shared by all sub-parts, the CVFS approach is able to achieve the goal of extracting the most representative features for yielding satisfactory AMR activity prediction accuracy. By testing this idea on bacterial pan-genome datasets, we showed that this approach was able to extract the most succinct feature sets that predicted AMR activities very well, indicating the potential of these genes as AMR biomarkers. The functional analysis demonstrated that the CVFS approach was able to extract both known AMR genes and novel ones, suggesting the capabilities of the algorithm in selecting relevant features and highlighting the potential of the novel genes in expanding the antimicrobial resistance gene databases.

5.
Sci Rep ; 12(1): 13412, 2022 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-35927323

RESUMO

O6-Methylguanine-DNA-methyltransferase (MGMT) promoter methylation was shown in many studies to be an important predictive biomarker for temozolomide (TMZ) resistance and poor progression-free survival in glioblastoma multiforme (GBM) patients. However, identifying the MGMT methylation status using molecular techniques remains challenging due to technical limitations, such as the inability to obtain tumor specimens, high prices for detection, and the high complexity of intralesional heterogeneity. To overcome these difficulties, we aimed to test the feasibility of using a novel radiomics-based machine learning (ML) model to preoperatively and noninvasively predict the MGMT methylation status. In this study, radiomics features extracted from multimodal images of GBM patients with annotated MGMT methylation status were downloaded from The Cancer Imaging Archive (TCIA) public database for retrospective analysis. The radiomics features extracted from multimodal images from magnetic resonance imaging (MRI) had undergone a two-stage feature selection method, including an eXtreme Gradient Boosting (XGBoost) feature selection model followed by a genetic algorithm (GA)-based wrapper model for extracting the most meaningful radiomics features for predictive purposes. The cross-validation results suggested that the GA-based wrapper model achieved the high performance with a sensitivity of 0.894, specificity of 0.966, and accuracy of 0.925 for predicting the MGMT methylation status in GBM. Application of the extracted GBM radiomics features on a low-grade glioma (LGG) dataset also achieved a sensitivity 0.780, specificity 0.620, and accuracy 0.750, indicating the potential of the selected radiomics features to be applied more widely on both low- and high-grade gliomas. The performance indicated that our model may potentially confer significant improvements in prognosis and treatment responses in GBM patients.


Assuntos
Neoplasias Encefálicas , Glioblastoma , Glioma , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/patologia , Metilação de DNA , Metilases de Modificação do DNA/genética , Metilases de Modificação do DNA/metabolismo , Enzimas Reparadoras do DNA/genética , Enzimas Reparadoras do DNA/metabolismo , Glioblastoma/diagnóstico por imagem , Glioblastoma/genética , Glioma/genética , Humanos , Aprendizado de Máquina , O(6)-Metilguanina-DNA Metiltransferase/genética , Estudos Retrospectivos , Proteínas Supressoras de Tumor/genética
6.
BMC Bioinformatics ; 23(Suppl 4): 131, 2022 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-35428201

RESUMO

BACKGROUND: Predicting which pathogens might exhibit antimicrobial resistance (AMR) based on genomics data is one of the promising ways to swiftly and precisely identify AMR pathogens. Currently, the most widely used genomics approach is through identifying known AMR genes from genomic information in order to predict whether a pathogen might be resistant to certain antibiotic drugs. The list of known AMR genes, however, is still far from comprehensive and may result in inaccurate AMR pathogen predictions. We thus felt the need to expand the AMR gene set and proposed a pan-genome-based feature selection method to identify potential gene sets for AMR prediction purposes. RESULTS: By building pan-genome datasets and extracting gene presence/absence patterns from four bacterial species, each with more than 2000 strains, we showed that machine learning models built from pan-genome data can be very promising for predicting AMR pathogens. The gene set selected by the eXtreme Gradient Boosting (XGBoost) feature selection approach further improved prediction outcomes, and an incremental approach selecting subsets of XGBoost-selected features brought the machine learning model performance to the next level. Investigating selected gene sets revealed that on average about 50% of genes had no known function and very few of them were known AMR genes, indicating the potential of the selected gene sets to expand resistance gene repertoires. CONCLUSIONS: We demonstrated that a pan-genome-based feature selection approach is suitable for building machine learning models for predicting AMR pathogens. The extracted gene sets may provide future clues to expand our knowledge of known AMR genes and provide novel hypotheses for inferring bacterial AMR mechanisms.


Assuntos
Antibacterianos , Farmacorresistência Bacteriana , Antibacterianos/farmacologia , Farmacorresistência Bacteriana/genética , Genoma Bacteriano , Aprendizado de Máquina , Sequenciamento Completo do Genoma/métodos
7.
J Biol Chem ; 293(25): 9801-9811, 2018 06 22.
Artigo em Inglês | MEDLINE | ID: mdl-29743241

RESUMO

Expression of placental growth factor (PGF) is closely associated with placental perfusion in early pregnancy. PGF is primarily expressed in placental trophoblasts, and its expression decreases in preeclampsia, associated with placental hypoxia. The transcription factors glial cells missing 1 (GCM1) and metal-regulatory transcription factor 1 (MTF1) have been implicated in the regulation of PGF gene expression through regulatory elements upstream and downstream of the PGF transcription start site, respectively. Here, we clarified the mechanism underlying placenta-specific PGF expression. We demonstrate that GCM1 up-regulates PGF expression through three downstream GCM1-binding sites (GBSs) but not a previously reported upstream GBS. Interestingly, we also found that these downstream GBSs also harbor metal-response elements for MTF1. Surprisingly, however, we observed that MTF1 is unlikely to regulate PGF expression in the placenta because knockdown or overexpression of GCM1, but not MTF1, dramatically decreased PGF expression or reversed the suppression of PGF expression under hypoxia, respectively. We also demonstrate that another transcription factor, Distal-less homeobox 3 (DLX3), interacts with the DNA-binding domain and the first transactivation domain of GCM1 and that this interaction inhibits GCM1-mediated PGF expression. Moreover, the GCM1-DLX3 interaction interfered with CREB-binding protein-mediated GCM1 acetylation and activation. In summary, we have identified several GBSs in the PGF promoter that are highly responsive to GCM1, have demonstrated that MTF1 does not significantly regulate PGF expression in placental cells, and provide evidence that DLX3 inhibits GCM1-mediated PGF expression. Our findings revise the mechanism for GCM1- and DLX3-mediated regulation of PGF gene expression.


Assuntos
Regulação da Expressão Gênica , Proteínas de Homeodomínio/metabolismo , Proteínas Nucleares/metabolismo , Fator de Crescimento Placentário/genética , Placenta/metabolismo , Elementos de Resposta , Fatores de Transcrição/metabolismo , Trofoblastos/metabolismo , Acetilação , Sequência de Bases , Diferenciação Celular , Proteínas de Ligação a DNA , Feminino , Proteínas de Homeodomínio/genética , Humanos , Proteínas Nucleares/genética , Fator de Crescimento Placentário/metabolismo , Gravidez , Regiões Promotoras Genéticas , Ligação Proteica , Fatores de Transcrição/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...