Pesquisa | Portal Regional da BVS (teste)

Machine learning-driven identification of the gene-expression signature associated with a persistent multiple organ dysfunction trajectory in critical illness.

Atreya, Mihir R; Banerjee, Shayantan; Lautz, Andrew J; Alder, Matthew N; Varisco, Brian M; Wong, Hector R; Muszynski, Jennifer A; Hall, Mark W; Sanchez-Pinto, L Nelson; Kamaleswaran, Rishikesan.

EBioMedicine ; 99: 104938, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38142638

RESUMO

BACKGROUND: Multiple organ dysfunction syndrome (MODS) disproportionately drives morbidity and mortality among critically ill patients. However, we lack a comprehensive understanding of its pathobiology. Identification of genes associated with a persistent MODS trajectory may shed light on underlying biology and allow for accurate prediction of those at-risk. METHODS: Secondary analyses of publicly available gene-expression datasets. Supervised machine learning (ML) was used to identify a parsimonious set of genes associated with a persistent MODS trajectory in a training set of pediatric septic shock. We optimized model parameters and tested risk-prediction capabilities in independent validation and test datasets, respectively. We compared model performance relative to an established gene-set predictive of sepsis mortality. FINDINGS: Patients with a persistent MODS trajectory had 568 differentially expressed genes and characterized by a dysregulated innate immune response. Supervised ML identified 111 genes associated with the outcome of interest on repeated cross-validation, with an AUROC of 0.87 (95% CI: 0.85-0.88) in the training set. The optimized model, limited to 20 genes, achieved AUROCs ranging from 0.74 to 0.79 in the validation and test sets to predict those with persistent MODS, regardless of host age and cause of organ dysfunction. Our classifier demonstrated reproducibility in identifying those with persistent MODS in comparison with a published gene-set predictive of sepsis mortality. INTERPRETATION: We demonstrate the utility of supervised ML driven identification of the genes associated with persistent MODS. Pending validation in enriched cohorts with a high burden of organ dysfunction, such an approach may inform targeted delivery of interventions among at-risk patients. FUNDING: H.R.W.'s NIHR35GM126943 award supported the work detailed in this manuscript. Upon his death, the award was transferred to M.N.A. M.R.A., N.S.P, and R.K were supported by NIHR21GM151703. R.K. was supported by R01GM139967.

Assuntos

Insuficiência de Múltiplos Órgãos , Sepse , Humanos , Criança , Insuficiência de Múltiplos Órgãos/genética , Estado Terminal , Reprodutibilidade dos Testes , Sepse/genética , Sepse/complicações , Aprendizado de Máquina

iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data.

Anilkumar Sithara, Anjana; Maripuri, Devi Priyanka; Moorthy, Keerthika; Amirtha Ganesh, Sai Sruthi; Philip, Philge; Banerjee, Shayantan; Sudhakar, Malvika; Raman, Karthik.

NAR Genom Bioinform ; 4(3): lqac053, 2022 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-35899080

RESUMO

Despite the tremendous increase in omics data generated by modern sequencing technologies, their analysis can be tricky and often requires substantial expertise in bioinformatics. To address this concern, we have developed a user-friendly pipeline to analyze (cancer) genomic data that takes in raw sequencing data (FASTQ format) as input and outputs insightful statistics. Our iCOMIC toolkit pipeline featuring many independent workflows is embedded in the popular Snakemake workflow management system. It can analyze whole-genome and transcriptome data and is characterized by a user-friendly GUI that offers several advantages, including minimal execution steps and eliminating the need for complex command-line arguments. Notably, we have integrated algorithms developed in-house to predict pathogenicity among cancer-causing mutations and differentiate between tumor suppressor genes and oncogenes from somatic mutation data. We benchmarked our tool against Genome In A Bottle benchmark dataset (NA12878) and got the highest F1 score of 0.971 and 0.988 for indels and SNPs, respectively, using the BWA MEM-GATK HC DNA-Seq pipeline. Similarly, we achieved a correlation coefficient of r = 0.85 using the HISAT2-StringTie-ballgown and STAR-StringTie-ballgown RNA-Seq pipelines on the human monocyte dataset (SRP082682). Overall, our tool enables easy analyses of omics datasets, significantly ameliorating complex data analysis pipelines.

Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes.

Banerjee, Shayantan; Raman, Karthik; Ravindran, Balaraman.

Cancers (Basel) ; 13(10)2021 May 14.

Artigo em Inglês | MEDLINE | ID: mdl-34068918

RESUMO

Identifying cancer-causing mutations from sequenced cancer genomes hold much promise for targeted therapy and precision medicine. "Driver" mutations are primarily responsible for cancer progression, while "passengers" are functionally neutral. Although several computational approaches have been developed for distinguishing between driver and passenger mutations, very few have concentrated on using the raw nucleotide sequences surrounding a particular mutation as potential features for building predictive models. Using experimentally validated cancer mutation data in this study, we explored various string-based feature representation techniques to incorporate information on the neighborhood bases immediately 5' and 3' from each mutated position. Density estimation methods showed significant distributional differences between the neighborhood bases surrounding driver and passenger mutations. Binary classification models derived using repeated cross-validation experiments provided comparable performances across all window sizes. Integrating sequence features derived from raw nucleotide sequences with other genomic, structural, and evolutionary features resulted in the development of a pan-cancer mutation effect prediction tool, NBDriver, which was highly efficient in identifying pathogenic variants from five independent validation datasets. An ensemble predictor obtained by combining the predictions from NBDriver with three other commonly used driver prediction tools (FATHMM (cancer), CONDEL, and MutationTaster) significantly outperformed existing pan-cancer models in prioritizing a literature-curated list of driver and passenger mutations. Using the list of true positive mutation predictions derived from NBDriver, we identified a list of 138 known driver genes with functional evidence from various sources. Overall, our study underscores the efficacy of using raw nucleotide sequences as features to distinguish between driver and passenger mutations from sequenced cancer genomes.

Machine Learning Identifies Complicated Sepsis Course and Subsequent Mortality Based on 20 Genes in Peripheral Blood Immune Cells at 24 H Post-ICU Admission.

Banerjee, Shayantan; Mohammed, Akram; Wong, Hector R; Palaniyar, Nades; Kamaleswaran, Rishikesan.

Front Immunol ; 12: 592303, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33692779

RESUMO

A complicated clinical course for critically ill patients admitted to the intensive care unit (ICU) usually includes multiorgan dysfunction and subsequent death. Owing to the heterogeneity, complexity, and unpredictability of the disease progression, ICU patient care is challenging. Identifying the predictors of complicated courses and subsequent mortality at the early stages of the disease and recognizing the trajectory of the disease from the vast array of longitudinal quantitative clinical data is difficult. Therefore, we attempted to perform a meta-analysis of previously published gene expression datasets to identify novel early biomarkers and train the artificial intelligence systems to recognize the disease trajectories and subsequent clinical outcomes. Using the gene expression profile of peripheral blood cells obtained within 24 h of pediatric ICU (PICU) admission and numerous clinical data from 228 septic patients from pediatric ICU, we identified 20 differentially expressed genes predictive of complicated course outcomes and developed a new machine learning model. After 5-fold cross-validation with 10 iterations, the overall mean area under the curve reached 0.82. Using a subset of the same set of genes, we further achieved an overall area under the curve of 0.72, 0.96, 0.83, and 0.82, respectively, on four independent external validation sets. This model was highly effective in identifying the clinical trajectories of the patients and mortality. Artificial intelligence systems identified eight out of twenty novel genetic markers (SDC4, CLEC5A, TCN1, MS4A3, HCAR3, OLAH, PLCB1, and NLRP1) that help predict sepsis severity or mortality. While these genes have been previously associated with sepsis mortality, in this work, we show that these genes are also implicated in complex disease courses, even among survivors. The discovery of eight novel genetic biomarkers related to the overactive innate immune system, including neutrophil function, and a new predictive machine learning method provides options to effectively recognize sepsis trajectories, modify real-time treatment options, improve prognosis, and patient survival.

Assuntos

Suscetibilidade a Doenças , Leucócitos/imunologia , Leucócitos/metabolismo , Aprendizado de Máquina , Sepse/epidemiologia , Sepse/etiologia , Transcriptoma , Biomarcadores , Mapeamento Cromossômico , Biologia Computacional/métodos , Cuidados Críticos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Mortalidade Hospitalar , Humanos , Unidades de Terapia Intensiva , Curva ROC , Reprodutibilidade dos Testes , Sepse/mortalidade , Fatores de Tempo

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA