Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Genome Biol ; 25(1): 159, 2024 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-38886757

RESUMO

BACKGROUND: The advent of single-cell RNA-sequencing (scRNA-seq) has driven significant computational methods development for all steps in the scRNA-seq data analysis pipeline, including filtering, normalization, and clustering. The large number of methods and their resulting parameter combinations has created a combinatorial set of possible pipelines to analyze scRNA-seq data, which leads to the obvious question: which is best? Several benchmarking studies compare methods but frequently find variable performance depending on dataset and pipeline characteristics. Alternatively, the large number of scRNA-seq datasets along with advances in supervised machine learning raise a tantalizing possibility: could the optimal pipeline be predicted for a given dataset? RESULTS: Here, we begin to answer this question by applying 288 scRNA-seq analysis pipelines to 86 datasets and quantifying pipeline success via a range of measures evaluating cluster purity and biological plausibility. We build supervised machine learning models to predict pipeline success given a range of dataset and pipeline characteristics. We find that prediction performance is significantly better than random and that in many cases pipelines predicted to perform well provide clustering outputs similar to expert-annotated cell type labels. We identify characteristics of datasets that correlate with strong prediction performance that could guide when such prediction models may be useful. CONCLUSIONS: Supervised machine learning models have utility for recommending analysis pipelines and therefore the potential to alleviate the burden of choosing from the near-infinite number of possibilities. Different aspects of datasets influence the predictive performance of such models which will further guide users.


Assuntos
Benchmarking , RNA-Seq , Análise de Célula Única , Análise de Célula Única/métodos , RNA-Seq/métodos , Humanos , Aprendizado de Máquina Supervisionado , Análise de Sequência de RNA/métodos , Análise por Conglomerados , Biologia Computacional/métodos , Aprendizado de Máquina , Animais , Análise da Expressão Gênica de Célula Única
2.
Nat Biotechnol ; 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38429430

RESUMO

Computational methods for integrating single-cell transcriptomic data from multiple samples and conditions do not generally account for imbalances in the cell types measured in different datasets. In this study, we examined how differences in the cell types present, the number of cells per cell type and the cell type proportions across samples affect downstream analyses after integration. The Iniquitate pipeline assesses the robustness of integration results after perturbing the degree of imbalance between datasets. Benchmarking of five state-of-the-art single-cell RNA sequencing integration techniques in 2,600 integration experiments indicates that sample imbalance has substantial impacts on downstream analyses and the biological interpretation of integration results. Imbalance perturbation led to statistically significant variation in unsupervised clustering, cell type classification, differential expression and marker gene annotation, query-to-reference mapping and trajectory inference. We quantified the impacts of imbalance through newly introduced properties-aggregate cell type support and minimum cell type center distance. To better characterize and mitigate impacts of imbalance, we introduce balanced clustering metrics and imbalanced integration guidelines for integration method users.

3.
Nat Commun ; 15(1): 1014, 2024 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-38307875

RESUMO

A crucial step in the analysis of single-cell data is annotating cells to cell types and states. While a myriad of approaches has been proposed, manual labeling of cells to create training datasets remains tedious and time-consuming. In the field of machine learning, active and self-supervised learning methods have been proposed to improve the performance of a classifier while reducing both annotation time and label budget. However, the benefits of such strategies for single-cell annotation have yet to be evaluated in realistic settings. Here, we perform a comprehensive benchmarking of active and self-supervised labeling strategies across a range of single-cell technologies and cell type annotation algorithms. We quantify the benefits of active learning and self-supervised strategies in the presence of cell type imbalance and variable similarity. We introduce adaptive reweighting, a heuristic procedure tailored to single-cell data-including a marker-aware version-that shows competitive performance with existing approaches. In addition, we demonstrate that having prior knowledge of cell type markers improves annotation accuracy. Finally, we summarize our findings into a set of recommendations for those implementing cell type annotation procedures or platforms. An R package implementing the heuristic approaches introduced in this work may be found at https://github.com/camlab-bioml/leader .


Assuntos
Algoritmos , Aprendizado de Máquina , Tecnologia , Conscientização , Aprendizado de Máquina Supervisionado , Análise de Célula Única
4.
Thorax ; 79(4): 307-315, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38195644

RESUMO

BACKGROUND: Low-dose CT screening can reduce lung cancer-related mortality. However, most screen-detected pulmonary abnormalities do not develop into cancer and it often remains challenging to identify malignant nodules, particularly among indeterminate nodules. We aimed to develop and assess prediction models based on radiological features to discriminate between benign and malignant pulmonary lesions detected on a baseline screen. METHODS: Using four international lung cancer screening studies, we extracted 2060 radiomic features for each of 16 797 nodules (513 malignant) among 6865 participants. After filtering out low-quality radiomic features, 642 radiomic and 9 epidemiological features remained for model development. We used cross-validation and grid search to assess three machine learning (ML) models (eXtreme Gradient Boosted Trees, random forest, least absolute shrinkage and selection operator (LASSO)) for their ability to accurately predict risk of malignancy for pulmonary nodules. We report model performance based on the area under the curve (AUC) and calibration metrics in the held-out test set. RESULTS: The LASSO model yielded the best predictive performance in cross-validation and was fit in the full training set based on optimised hyperparameters. Our radiomics model had a test-set AUC of 0.93 (95% CI 0.90 to 0.96) and outperformed the established Pan-Canadian Early Detection of Lung Cancer model (AUC 0.87, 95% CI 0.85 to 0.89) for nodule assessment. Our model performed well among both solid (AUC 0.93, 95% CI 0.89 to 0.97) and subsolid nodules (AUC 0.91, 95% CI 0.85 to 0.95). CONCLUSIONS: We developed highly accurate ML models based on radiomic and epidemiological features from four international lung cancer screening studies that may be suitable for assessing indeterminate screen-detected pulmonary nodules for risk of malignancy.


Assuntos
Neoplasias Pulmonares , Nódulos Pulmonares Múltiplos , Humanos , Neoplasias Pulmonares/diagnóstico , Detecção Precoce de Câncer , Radiômica , Tomografia Computadorizada por Raios X , Canadá , Nódulos Pulmonares Múltiplos/patologia , Aprendizado de Máquina , Estudos Retrospectivos
5.
FASEB J ; 36(10): e22560, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36165236

RESUMO

Angiogenesis inhibitor drugs targeting vascular endothelial growth factor (VEGF) signaling to the endothelial cell (EC) are used to treat various cancer types. However, primary or secondary resistance to therapy is common. Clinical and pre-clinical studies suggest that alternative pro-angiogenic factors are upregulated after VEGF pathway inhibition. Therefore, identification of alternative pro-angiogenic pathway(s) is critical for the development of more effective anti-angiogenic therapy. Here we study the role of apelin as a pro-angiogenic G-protein-coupled receptor ligand in tumor growth and angiogenesis. We found that loss of apelin in mice delayed the primary tumor growth of Lewis lung carcinoma 1 and B16F10 melanoma when combined with the VEGF receptor tyrosine kinase inhibitor, sunitinib. Targeting apelin in combination with sunitinib markedly reduced the tumor vessel density, and decreased microvessel remodeling. Apelin loss reduced angiogenic sprouting and tip cell marker gene expression in comparison to the sunitinib-alone-treated mice. Single-cell RNA sequencing of tumor EC demonstrated that the loss of apelin prevented EC tip cell differentiation. Thus, apelin is a potent pro-angiogenic cue that supports initiation of tumor neovascularization. Together, our data suggest that targeting apelin may be useful as adjuvant therapy in combination with VEGF signaling inhibition to inhibit the growth of advanced tumors.


Assuntos
Neoplasias Experimentais , Neoplasias , Inibidores da Angiogênese/farmacologia , Animais , Apelina , Ligantes , Camundongos , Neoplasias/tratamento farmacológico , Neoplasias Experimentais/tratamento farmacológico , Neovascularização Patológica/tratamento farmacológico , Inibidores de Proteínas Quinases/farmacologia , Receptores Acoplados a Proteínas G/fisiologia , Receptores de Fatores de Crescimento do Endotélio Vascular , Sunitinibe/farmacologia , Fator A de Crescimento do Endotélio Vascular/metabolismo , Fatores de Crescimento do Endotélio Vascular/uso terapêutico
6.
J Gen Intern Med ; 37(1): 154-161, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34755268

RESUMO

IMPORTANCE: SARS-CoV-2 has infected over 200 million people worldwide, resulting in more than 4 million deaths. Randomized controlled trials are the single best tool to identify effective treatments against this novel pathogen. OBJECTIVE: To describe the characteristics of randomized controlled trials of treatments for COVID-19 in the United States launched in the first 9 months of the pandemic. Design, Setting, and Participants We conducted a cross-sectional study of all completed or actively enrolling randomized, interventional, clinical trials for the treatment of COVID-19 in the United States registered on www.clinicaltrials.gov as of August 10, 2020. We excluded trials of vaccines and other interventions intended to prevent COVID-19. Main Outcomes and Measures We used descriptive statistics to characterize the clinical trials and the statistical power for the available studies. For the late-phase trials (i.e., phase 3 and 2/3 studies), we compared the geographic distribution of the clinical trials with the geographic distribution of people diagnosed with COVID-19. RESULTS: We identified 200 randomized controlled trials of treatments for people with COVID-19. Across all trials, 87 (43.5%) were single-center, 64 (32.0%) were unblinded, and 80 (40.0%) were sponsored by industry. The most common treatments included monoclonal antibodies (N=46 trials), small molecule immunomodulators (N=28), antiviral medications (N=24 trials), and hydroxychloroquine (N=20 trials). Of the 9 trials completed by August 2020, the median sample size was 450 (IQR 67-1113); of the 191 ongoing trials, the median planned sample size was 150 (IQR 60-400). Of the late-phase trials (N=54), the most common primary outcome was a severity scale (N=23, 42.6%), followed by a composite of mortality and ventilation (N=10, 18.5%), and mortality alone (N=6, 11.1%). Among these late-phase trials, all trials of antivirals, monoclonal antibodies, or chloroquine/hydroxychloroquine had a power of less than 25% to detect a 20% relative risk reduction in mortality. Had the individual trials for a given class of treatments instead formed a single trial, the power to detect that same reduction in mortality would have been greater than 98%. There was large variability in access to trials with the highest number of trials per capita in the Northeast and the lowest in the Midwest. CONCLUSIONS AND RELEVANCE: A large number of randomized trials were launched early in the pandemic to evaluate treatments for COVID-19. However, many trials were underpowered for important clinical endpoints and substantial geographic disparities were observed, highlighting the importance of improving national clinical trial infrastructure.


Assuntos
COVID-19 , Estudos Transversais , Humanos , Pandemias , Ensaios Clínicos Controlados Aleatórios como Assunto , SARS-CoV-2 , Resultado do Tratamento , Estados Unidos/epidemiologia
7.
NEJM Evid ; 1(5): EVIDe2200062, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-38319201

RESUMO

The Basics of Machine LearningWhen a person is pregnant, a key question is how to establish the "date" of the pregnancy. Classically, the date was based on the last menstrual period (LMP). For the past 3 decades or more, in high-resource countries, this has been done using "hospital-grade" ultrasound machines, with testing performed by trained sonographers. In many parts of the world, neither the machines nor the trained sonographers are accessible. In an article published in NEJM Evidence, Pokaprakarn et al.1 asked whether a low-cost handheld ultrasound device combined with artificial intelligence (AI) could substitute for the expensive machines and trained sonographers.

8.
Cell Syst ; 12(12): 1173-1186.e5, 2021 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-34536381

RESUMO

A major challenge in the analysis of highly multiplexed imaging data is the assignment of cells to a priori known cell types. Existing approaches typically solve this by clustering cells followed by manual annotation. However, these often require several subjective choices and cannot explicitly assign cells to an uncharacterized type. To help address these issues we present Astir, a probabilistic model to assign cells to cell types by integrating prior knowledge of marker proteins. Astir uses deep recognition neural networks for fast inference, allowing for annotations at the million-cell scale in the absence of a previously annotated reference. We apply Astir to over 2.4 million cells from suspension and imaging datasets and demonstrate its scalability, robustness to sample composition, and interpretable uncertainty estimates. We envision deployment of Astir either for a first broad cell type assignment or to accurately annotate cells that may serve as biomarkers in multiple disease contexts. A record of this paper's transparent peer review process is included in the supplemental information.


Assuntos
Redes Neurais de Computação , Proteômica , Análise por Conglomerados
9.
Nature ; 595(7868): 585-590, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34163070

RESUMO

Progress in defining genomic fitness landscapes in cancer, especially those defined by copy number alterations (CNAs), has been impeded by lack of time-series single-cell sampling of polyclonal populations and temporal statistical models1-7. Here we generated 42,000 genomes from multi-year time-series single-cell whole-genome sequencing of breast epithelium and primary triple-negative breast cancer (TNBC) patient-derived xenografts (PDXs), revealing the nature of CNA-defined clonal fitness dynamics induced by TP53 mutation and cisplatin chemotherapy. Using a new Wright-Fisher population genetics model8,9 to infer clonal fitness, we found that TP53 mutation alters the fitness landscape, reproducibly distributing fitness over a larger number of clones associated with distinct CNAs. Furthermore, in TNBC PDX models with mutated TP53, inferred fitness coefficients from CNA-based genotypes accurately forecast experimentally enforced clonal competition dynamics. Drug treatment in three long-term serially passaged TNBC PDXs resulted in cisplatin-resistant clones emerging from low-fitness phylogenetic lineages in the untreated setting. Conversely, high-fitness clones from treatment-naive controls were eradicated, signalling an inversion of the fitness landscape. Finally, upon release of drug, selection pressure dynamics were reversed, indicating a fitness cost of treatment resistance. Together, our findings define clonal fitness linked to both CNA and therapeutic resistance in polyclonal tumours.


Assuntos
Variações do Número de Cópias de DNA , Resistencia a Medicamentos Antineoplásicos , Neoplasias de Mama Triplo Negativas/genética , Animais , Linhagem Celular Tumoral , Cisplatino/farmacologia , Células Clonais/patologia , Feminino , Aptidão Genética , Humanos , Camundongos , Modelos Estatísticos , Transplante de Neoplasias , Proteína Supressora de Tumor p53/genética , Sequenciamento Completo do Genoma
10.
J Pathol ; 254(3): 254-264, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33797756

RESUMO

Hereditary diffuse gastric cancer (HDGC) is a cancer syndrome caused by germline variants in CDH1, the gene encoding the cell-cell adhesion molecule E-cadherin. Loss of E-cadherin in cancer is associated with cellular dedifferentiation and poor prognosis, but the mechanisms through which CDH1 loss initiates HDGC are not known. Using single-cell RNA sequencing, we explored the transcriptional landscape of a murine organoid model of HDGC to characterize the impact of CDH1 loss in early tumourigenesis. Progenitor populations of stratified squamous and simple columnar epithelium, characteristic of the mouse stomach, showed lineage-specific transcriptional programs. Cdh1 inactivation resulted in shifts along the squamous differentiation trajectory associated with aberrant expression of genes central to gastrointestinal epithelial differentiation. Cytokeratin 7 (CK7), encoded by the differentiation-dependent gene Krt7, was a specific marker for early neoplastic lesions in CDH1 carriers. Our findings suggest that deregulation of developmental transcriptional programs may precede malignancy in HDGC. © 2021 The Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.


Assuntos
Caderinas/genética , Transformação Celular Neoplásica/genética , Regulação Neoplásica da Expressão Gênica/genética , Predisposição Genética para Doença/genética , Neoplasias Gástricas/genética , Animais , Transformação Celular Neoplásica/patologia , Modelos Animais de Doenças , Camundongos , Camundongos Transgênicos , Organoides , Análise de Célula Única , Neoplasias Gástricas/patologia , Transcriptoma
11.
Phys Biol ; 17(6): 061001, 2020 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-32759485

RESUMO

Single-cell technologies have revolutionized biomedical research by enabling scalable measurement of the genome, transcriptome, proteome, and epigenome of multiple systems at single-cell resolution. Now widely applied to cancer models, these assays offer new insights into tumour heterogeneity, which underlies cancer initiation, progression, and relapse. However, the large quantities of high-dimensional, noisy data produced by single-cell assays can complicate data analysis, obscuring biological signals with technical artifacts. In this review article, we outline the major challenges in analyzing single-cell cancer genomics data and survey the current computational tools available to tackle these. We further outline unsolved problems that we consider major opportunities for future methods development to help interpret the vast quantities of data being generated.


Assuntos
Biologia Computacional/métodos , Genoma , Genômica/métodos , Neoplasias/genética , Análise de Célula Única/métodos , Simulação por Computador , Humanos
12.
J Pathol ; 252(2): 201-214, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32686114

RESUMO

Endometrial carcinoma, the most common gynaecological cancer, develops from endometrial epithelium which is composed of secretory and ciliated cells. Pathologic classification is unreliable and there is a need for prognostic tools. We used single cell sequencing to study organoid model systems derived from normal endometrial endometrium to discover novel markers specific for endometrial ciliated or secretory cells. A marker of secretory cells (MPST) and several markers of ciliated cells (FAM92B, WDR16, and DYDC2) were validated by immunohistochemistry on organoids and tissue sections. We performed single cell sequencing on endometrial and ovarian tumours and found both secretory-like and ciliated-like tumour cells. We found that ciliated cell markers (DYDC2, CTH, FOXJ1, and p73) and the secretory cell marker MPST were expressed in endometrial tumours and positively correlated with disease-specific and overall survival of endometrial cancer patients. These findings suggest that expression of differentiation markers in tumours correlates with less aggressive disease, as would be expected for tumours that retain differentiation capacity, albeit cryptic in the case of ciliated cells. These markers could be used to improve the risk stratification of endometrial cancer patients, thereby improving their management. We further assessed whether consideration of MPST expression could refine the ProMiSE molecular classification system for endometrial tumours. We found that higher expression levels of MPST could be used to refine stratification of three of the four ProMiSE molecular subgroups, and that any level of MPST expression was able to significantly refine risk stratification of the copy number high subgroup which has the worst prognosis. Taken together, this shows that single cell sequencing of putative cells of origin has the potential to uncover novel biomarkers that could be used to guide management of cancers. © 2020 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.


Assuntos
Biomarcadores Tumorais/análise , Carcinoma Endometrioide/patologia , Neoplasias do Endométrio/patologia , Análise de Sequência de RNA/métodos , Diferenciação Celular , Feminino , Humanos , Organoides , Transcriptoma
13.
Genome Biol ; 21(1): 31, 2020 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-32033589

RESUMO

The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.


Assuntos
Ciência de Dados/métodos , Genômica/métodos , RNA-Seq/métodos , Análise de Célula Única/métodos , Animais , Humanos
14.
Genome Biol ; 20(1): 210, 2019 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-31623682

RESUMO

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) is a powerful tool for studying complex biological systems, such as tumor heterogeneity and tissue microenvironments. However, the sources of technical and biological variation in primary solid tumor tissues and patient-derived mouse xenografts for scRNA-seq are not well understood. RESULTS: We use low temperature (6 °C) protease and collagenase (37 °C) to identify the transcriptional signatures associated with tissue dissociation across a diverse scRNA-seq dataset comprising 155,165 cells from patient cancer tissues, patient-derived breast cancer xenografts, and cancer cell lines. We observe substantial variation in standard quality control metrics of cell viability across conditions and tissues. From the contrast between tissue protease dissociation at 37 °C or 6 °C, we observe that collagenase digestion results in a stress response. We derive a core gene set of 512 heat shock and stress response genes, including FOS and JUN, induced by collagenase (37 °C), which are minimized by dissociation with a cold active protease (6 °C). While induction of these genes was highly conserved across all cell types, cell type-specific responses to collagenase digestion were observed in patient tissues. CONCLUSIONS: The method and conditions of tumor dissociation influence cell yield and transcriptome state and are both tissue- and cell-type dependent. Interpretation of stress pathway expression differences in cancer single-cell studies, including components of surface immune recognition such as MHC class I, may be especially confounded. We define a core set of 512 genes that can assist with the identification of such effects in dissociated scRNA-seq experiments.


Assuntos
Genômica/métodos , Neoplasias/metabolismo , Análise de Sequência de RNA , Análise de Célula Única , Animais , Temperatura Baixa , Colagenases , Humanos , Camundongos , Peptídeo Hidrolases , Estresse Fisiológico , Transcriptoma
15.
Nat Methods ; 16(10): 1007-1015, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31501550

RESUMO

Single-cell RNA sequencing has enabled the decomposition of complex tissues into functionally distinct cell types. Often, investigators wish to assign cells to cell types through unsupervised clustering followed by manual annotation or via 'mapping' to existing data. However, manual interpretation scales poorly to large datasets, mapping approaches require purified or pre-annotated data and both are prone to batch effects. To overcome these issues, we present CellAssign, a probabilistic model that leverages prior knowledge of cell-type marker genes to annotate single-cell RNA sequencing data into predefined or de novo cell types. CellAssign automates the process of assigning cells in a highly scalable manner across large datasets while controlling for batch and sample effects. We demonstrate the advantages of CellAssign through extensive simulations and analysis of tumor microenvironment composition in high-grade serous ovarian cancer and follicular lymphoma.


Assuntos
Perfilação da Expressão Gênica , Linfoma Folicular/patologia , Probabilidade , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Microambiente Tumoral , Humanos , Linfoma Folicular/imunologia
16.
Genome Biol ; 20(1): 54, 2019 03 12.
Artigo em Inglês | MEDLINE | ID: mdl-30866997

RESUMO

Measuring gene expression of tumor clones at single-cell resolution links functional consequences to somatic alterations. Without scalable methods to simultaneously assay DNA and RNA from the same single cell, parallel single-cell DNA and RNA measurements from independent cell populations must be mapped for genome-transcriptome association. We present clonealign, which assigns gene expression states to cancer clones using single-cell RNA and DNA sequencing independently sampled from a heterogeneous population. We apply clonealign to triple-negative breast cancer patient-derived xenografts and high-grade serous ovarian cancer cell lines and discover clone-specific dysregulated biological pathways not visible using either sequencing method alone.


Assuntos
Biomarcadores Tumorais/genética , Cistadenocarcinoma Seroso/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Estatísticos , Neoplasias Ovarianas/genética , Análise de Célula Única/métodos , Software , Neoplasias de Mama Triplo Negativas/genética , Animais , Células Clonais , Cistadenocarcinoma Seroso/patologia , Feminino , Humanos , Camundongos Endogâmicos NOD , Camundongos SCID , Neoplasias Ovarianas/patologia , Neoplasias de Mama Triplo Negativas/patologia , Células Tumorais Cultivadas , Ensaios Antitumorais Modelo de Xenoenxerto
17.
Bioinformatics ; 35(1): 28-35, 2019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-29939207

RESUMO

Motivation: Pseudotime estimation from single-cell gene expression data allows the recovery of temporal information from otherwise static profiles of individual cells. Conventional pseudotime inference methods emphasize an unsupervised transcriptome-wide approach and use retrospective analysis to evaluate the behaviour of individual genes. However, the resulting trajectories can only be understood in terms of abstract geometric structures and not in terms of interpretable models of gene behaviour. Results: Here we introduce an orthogonal Bayesian approach termed 'Ouija' that learns pseudotimes from a small set of marker genes that might ordinarily be used to retrospectively confirm the accuracy of unsupervised pseudotime algorithms. Crucially, we model these genes in terms of switch-like or transient behaviour along the trajectory, allowing us to understand why the pseudotimes have been inferred and learn informative parameters about the behaviour of each gene. Since each gene is associated with a switch or peak time the genes are effectively ordered along with the cells, allowing each part of the trajectory to be understood in terms of the behaviour of certain genes. We demonstrate that this small panel of marker genes can recover pseudotimes that are consistent with those obtained using the entire transcriptome. Furthermore, we show that our method can detect differences in the regulation timings between two genes and identify 'metastable' states-discrete cell types along the continuous trajectories-that recapitulate known cell types. Availability and implementation: An open source implementation is available as an R package at http://www.github.com/kieranrcampbell/ouija and as a Python/TensorFlow package at http://www.github.com/kieranrcampbell/ouijaflow. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Célula Única , Software , Algoritmos , Teorema de Bayes , Biologia Computacional
18.
Cell Stem Cell ; 24(1): 93-106.e6, 2019 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-30503143

RESUMO

Induced pluripotent stem cell (iPSC)-derived dopamine neurons provide an opportunity to model Parkinson's disease (PD), but neuronal cultures are confounded by asynchronous and heterogeneous appearance of disease phenotypes in vitro. Using high-resolution, single-cell transcriptomic analyses of iPSC-derived dopamine neurons carrying the GBA-N370S PD risk variant, we identified a progressive axis of gene expression variation leading to endoplasmic reticulum stress. Pseudotime analysis of genes differentially expressed (DE) along this axis identified the transcriptional repressor histone deacetylase 4 (HDAC4) as an upstream regulator of disease progression. HDAC4 was mislocalized to the nucleus in PD iPSC-derived dopamine neurons and repressed genes early in the disease axis, leading to late deficits in protein homeostasis. Treatment of iPSC-derived dopamine neurons with HDAC4-modulating compounds upregulated genes early in the DE axis and corrected PD-related cellular phenotypes. Our study demonstrates how single-cell transcriptomics can exploit cellular heterogeneity to reveal disease mechanisms and identify therapeutic targets.


Assuntos
Neurônios Dopaminérgicos/patologia , Regulação da Expressão Gênica , Histona Desacetilases/metabolismo , Células-Tronco Pluripotentes Induzidas/patologia , Doença de Parkinson/patologia , Proteínas Repressoras/metabolismo , Análise de Célula Única/métodos , Progressão da Doença , Dopamina/metabolismo , Neurônios Dopaminérgicos/metabolismo , Estresse do Retículo Endoplasmático , Perfilação da Expressão Gênica , Glucosilceramidase/genética , Histona Desacetilases/genética , Humanos , Células-Tronco Pluripotentes Induzidas/metabolismo , Mutação , Doença de Parkinson/genética , Doença de Parkinson/metabolismo , Fenótipo , Proteínas Repressoras/genética , Transcriptoma
19.
Nat Commun ; 9(1): 2442, 2018 06 22.
Artigo em Inglês | MEDLINE | ID: mdl-29934517

RESUMO

Pseudotime algorithms can be employed to extract latent temporal information from cross-sectional data sets allowing dynamic biological processes to be studied in situations where the collection of time series data is challenging or prohibitive. Computational techniques have arisen from single-cell 'omics and cancer modelling where pseudotime can be used to learn about cellular differentiation or tumour progression. However, methods to date typically implicitly assume homogeneous genetic, phenotypic or environmental backgrounds, which becomes limiting as data sets grow in size and complexity. We describe a novel statistical framework that learns how pseudotime trajectories can be modulated through covariates that encode such factors. We apply this model to both single-cell and bulk gene expression data sets and show that the approach can recover known and novel covariate-pseudotime interaction effects. This hybrid regression-latent variable model framework extends pseudotemporal modelling from its most prevalent area of single cell genomics to wider applications.


Assuntos
Perfilação da Expressão Gênica/métodos , Genômica/métodos , Modelos Genéticos , Algoritmos , Conjuntos de Dados como Assunto , Humanos , Análise de Célula Única , Fatores de Tempo
20.
Wellcome Open Res ; 2: 19, 2017 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-28503665

RESUMO

Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...