Pesquisa | Portal Regional da BVS

Discovery of optimal cell type classification marker genes from single cell RNA sequencing data.

Liu, Angela; Peng, Beverly; Pankajam, Ajith V; Duong, Thu Elizabeth; Pryhuber, Gloria; Scheuermann, Richard H; Zhang, Yun.

bioRxiv ; 2024 Jun 26.

Artigo em Inglês | MEDLINE | ID: mdl-38712147

RESUMO

The use of single cell/nucleus RNA sequencing (scRNA-seq) technologies that quantitively describe cell transcriptional phenotypes is revolutionizing our understanding of cell biology, leading to new insights in cell type identification, disease mechanisms, and drug development. The tremendous growth in scRNA-seq data has posed new challenges in efficiently characterizing data-driven cell types and identifying quantifiable marker genes for cell type classification. The use of machine learning and explainable artificial intelligence has emerged as an effective approach to study large-scale scRNA-seq data. NS-Forest is a random forest machine learning-based algorithm that aims to provide a scalable data-driven solution to identify minimum combinations of necessary and sufficient marker genes that capture cell type identity with maximum classification accuracy. Here, we describe the latest version, NS-Forest version 4.0 and its companion Python package ( https://github.com/JCVenterInstitute/NSForest ), with several enhancements to select marker gene combinations that exhibit highly selective expression patterns among closely related cell types and more efficiently perform marker gene selection for large-scale scRNA-seq data atlases with millions of cells. By modularizing the final decision tree step, NS-Forest v4.0 can be used to compare the performance of user-defined marker genes with the NS-Forest computationally-derived marker genes based on the decision tree classifiers. To quantify how well the identified markers exhibit the desired pattern of being exclusively expressed at high levels within their target cell types, we introduce the On-Target Fraction metric that ranges from 0 to 1, with a metric of 1 assigned to markers that are only expressed within their target cell types and not in cells of any other cell types. NS-Forest v4.0 outperforms previous versions on its ability to identify markers with higher On-Target Fraction values for closely related cell types and outperforms other marker gene selection approaches at classification with significantly higher F-beta scores when applied to datasets from three human organs - brain, kidney, and lung.

Genome-wide association study reveals multiple loci for nociception and opioid consumption behaviors associated with heroin vulnerability in outbred rats.

Kuhn, Brittany N; Cannella, Nazzareno; Chitre, Apurva S; Nguyen, Khai-Minh H; Cohen, Katarina; Chen, Denghui; Peng, Beverly; Ziegler, Kendra S; Lin, Bonnie; Johnson, Benjamin B; Missfeldt Sanches, Thiago; Crow, Ayteria D; Lunerti, Veronica; Gupta, Arkobrato; Dereschewitz, Eric; Soverchia, Laura; Hopkins, Jordan L; Roberts, Analyse T; Ubaldi, Massimo; Abdulmalek, Sarah; Kinen, Analia; Hardiman, Gary; Chung, Dongjun; Polesskaya, Oksana; Solberg Woods, Leah C; Ciccocioppo, Roberto; Kalivas, Peter W; Palmer, Abraham A.

bioRxiv ; 2024 Jun 25.

Artigo em Inglês | MEDLINE | ID: mdl-38712202

RESUMO

The increased prevalence of opioid use disorder (OUD) makes it imperative to disentangle the biological mechanisms contributing to individual differences in OUD vulnerability. OUD shows strong heritability, however genetic variants contributing toward vulnerability remain poorly defined. We performed a genome-wide association study using over 850 male and female heterogeneous stock (HS) rats to identify genes underlying behaviors associated with OUD such as nociception, as well as heroin-taking, extinction and seeking behaviors. By using an animal model of OUD, we were able to identify genetic variants associated with distinct OUD behaviors while maintaining a uniform environment, an experimental design not easily achieved in humans. Furthermore, we used a novel non-linear network-based clustering approach to characterize rats based on OUD vulnerability to assess genetic variants associated with OUD susceptibility. Our findings confirm the heritability of several OUD-like behaviors, including OUD susceptibility. Additionally, several genetic variants associated with nociceptive threshold prior to heroin experience, heroin consumption, escalation of intake, and motivation to obtain heroin were identified. Tom1 , a microglial component, was implicated for nociception. Several genes involved in dopaminergic signaling, neuroplasticity and substance use disorders, including Brwd1 , Pcp4, Phb1l2 and Mmp15 were implicated for the heroin traits. Additionally, an OUD vulnerable phenotype was associated with genetic variants for consumption and break point, suggesting a specific genetic contribution for OUD-like traits contributing to vulnerability. Together, these findings identify novel genetic markers related to the susceptibility to OUD-relevant behaviors in HS rats.

Machine learning for cell type classification from single nucleus RNA sequencing data.

Le, Huy; Peng, Beverly; Uy, Janelle; Carrillo, Daniel; Zhang, Yun; Aevermann, Brian D; Scheuermann, Richard H.

PLoS One ; 17(9): e0275070, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36149937

RESUMO

With the advent of single cell/nucleus RNA sequencing (sc/snRNA-seq), the field of cell phenotyping is now a data-driven exercise providing statistical evidence to support cell type/state categorization. However, the task of classifying cells into specific, well-defined categories with the empirical data provided by sc/snRNA-seq remains nontrivial due to the difficulty in determining specific differences between related cell types with close transcriptional similarities, resulting in challenges with matching cell types identified in separate experiments. To investigate possible approaches to overcome these obstacles, we explored the use of supervised machine learning methods-logistic regression, support vector machines, random forests, neural networks, and light gradient boosting machine (LightGBM)-as approaches to classify cell types using snRNA-seq datasets from human brain middle temporal gyrus (MTG) and human kidney. Classification accuracy was evaluated using an F-beta score weighted in favor of precision to account for technical artifacts of gene expression dropout. We examined the impact of hyperparameter optimization and feature selection methods on F-beta score performance. We found that the best performing model for granular cell type classification in both datasets is a multinomial logistic regression classifier and that an effective feature selection step was the most influential factor in optimizing the performance of the machine learning pipelines.

Assuntos

Aprendizado de Máquina , RNA , Humanos , Modelos Logísticos , RNA Nuclear Pequeno , Análise de Sequência de RNA/métodos , Máquina de Vetores de Suporte

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA