Search | VHL Regional Portal

A deep profile of gene expression across 18 human cancers.

Qiu, Wei; Dincer, Ayse B; Janizek, Joseph D; Celik, Safiye; Pittet, Mikael; Naxerova, Kamila; Lee, Su-In.

bioRxiv ; 2024 Mar 17.

Article in English | MEDLINE | ID: mdl-38559197

ABSTRACT

Clinically and biologically valuable information may reside untapped in large cancer gene expression data sets. Deep unsupervised learning has the potential to extract this information with unprecedented efficacy but has thus far been hampered by a lack of biological interpretability and robustness. Here, we present DeepProfile, a comprehensive framework that addresses current challenges in applying unsupervised deep learning to gene expression profiles. We use DeepProfile to learn low-dimensional latent spaces for 18 human cancers from 50,211 transcriptomes. DeepProfile outperforms existing dimensionality reduction methods with respect to biological interpretability. Using DeepProfile interpretability methods, we show that genes that are universally important in defining the latent spaces across all cancer types control immune cell activation, while cancer type-specific genes and pathways define molecular disease subtypes. By linking DeepProfile latent variables to secondary tumor characteristics, we discover that tumor mutation burden is closely associated with the expression of cell cycle-related genes. DNA mismatch repair and MHC class II antigen presentation pathway expression, on the other hand, are consistently associated with patient survival. We validate these results through Kaplan-Meier analyses and nominate tumor-associated macrophages as an important source of survival-correlated MHC class II transcripts. Our results illustrate the power of unsupervised deep learning for discovery of novel cancer biology from existing gene expression data.

Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians.

DeGrave, Alex J; Cai, Zhuo Ran; Janizek, Joseph D; Daneshjou, Roxana; Lee, Su-In.

Nat Biomed Eng ; 2023 Dec 28.

Article in English | MEDLINE | ID: mdl-38155295

ABSTRACT

The inferences of most machine-learning models powering medical artificial intelligence are difficult to interpret. Here we report a general framework for model auditing that combines insights from medical experts with a highly expressive form of explainable artificial intelligence. Specifically, we leveraged the expertise of dermatologists for the clinical task of differentiating melanomas from melanoma 'lookalikes' on the basis of dermoscopic and clinical images of the skin, and the power of generative models to render 'counterfactual' images to understand the 'reasoning' processes of five medical-image classifiers. By altering image attributes to produce analogous images that elicit a different prediction by the classifiers, and by asking physicians to identify medically meaningful features in the images, the counterfactual images revealed that the classifiers rely both on features used by human dermatologists, such as lesional pigmentation patterns, and on undesirable features, such as background skin texture and colour balance. The framework can be applied to any specialized medical domain to make the powerful inference processes of machine-learning models medically understandable.

Dissection of medical AI reasoning processes via physician and generative-AI collaboration.

DeGrave, Alex J; Cai, Zhuo Ran; Janizek, Joseph D; Daneshjou, Roxana; Lee, Su-In.

medRxiv ; 2023 May 16.

Article in English | MEDLINE | ID: mdl-37292705

ABSTRACT

Despite the proliferation and clinical deployment of artificial intelligence (AI)-based medical software devices, most remain black boxes that are uninterpretable to key stakeholders including patients, physicians, and even the developers of the devices. Here, we present a general model auditing framework that combines insights from medical experts with a highly expressive form of explainable AI that leverages generative models, to understand the reasoning processes of AI devices. We then apply this framework to generate the first thorough, medically interpretable picture of the reasoning processes of machine-learning-based medical image AI. In our synergistic framework, a generative model first renders "counterfactual" medical images, which in essence visually represent the reasoning process of a medical AI device, and then physicians translate these counterfactual images to medically meaningful features. As our use case, we audit five high-profile AI devices in dermatology, an area of particular interest since dermatology AI devices are beginning to achieve deployment globally. We reveal how dermatology AI devices rely both on features used by human dermatologists, such as lesional pigmentation patterns, as well as multiple, previously unreported, potentially undesirable features, such as background skin texture and image color balance. Our study also sets a precedent for the rigorous application of explainable AI to understand AI in any specialized domain and provides a means for practitioners, clinicians, and regulators to uncloak AI's powerful but previously enigmatic reasoning processes in a medically understandable way.

Uncovering expression signatures of synergistic drug responses via ensembles of explainable machine-learning models.

Janizek, Joseph D; Dincer, Ayse B; Celik, Safiye; Chen, Hugh; Chen, William; Naxerova, Kamila; Lee, Su-In.

Nat Biomed Eng ; 7(6): 811-829, 2023 06.

Article in English | MEDLINE | ID: mdl-37127711

ABSTRACT

Machine learning may aid the choice of optimal combinations of anticancer drugs by explaining the molecular basis of their synergy. By combining accurate models with interpretable insights, explainable machine learning promises to accelerate data-driven cancer pharmacology. However, owing to the highly correlated and high-dimensional nature of transcriptomic data, naively applying current explainable machine-learning strategies to large transcriptomic datasets leads to suboptimal outcomes. Here by using feature attribution methods, we show that the quality of the explanations can be increased by leveraging ensembles of explainable machine-learning models. We applied the approach to a dataset of 133 combinations of 46 anticancer drugs tested in ex vivo tumour samples from 285 patients with acute myeloid leukaemia and uncovered a haematopoietic-differentiation signature underlying drug combinations with therapeutic synergy. Ensembles of machine-learning models trained to predict drug combination synergies on the basis of gene-expression data may improve the feature attribution quality of complex machine-learning models.

Subject(s)

Gene Expression Profiling , Machine Learning , Humans , Transcriptome

PAUSE: principled feature attribution for unsupervised gene expression analysis.

Janizek, Joseph D; Spiro, Anna; Celik, Safiye; Blue, Ben W; Russell, John C; Lee, Ting-I; Kaeberlin, Matt; Lee, Su-In.

Genome Biol ; 24(1): 81, 2023 04 19.

Article in English | MEDLINE | ID: mdl-37076856

ABSTRACT

As interest in using unsupervised deep learning models to analyze gene expression data has grown, an increasing number of methods have been developed to make these models more interpretable. These methods can be separated into two groups: post hoc analyses of black box models through feature attribution methods and approaches to build inherently interpretable models through biologically-constrained architectures. We argue that these approaches are not mutually exclusive, but can in fact be usefully combined. We propose PAUSE ( https://github.com/suinleelab/PAUSE ), an unsupervised pathway attribution method that identifies major sources of transcriptomic variation when combined with biologically-constrained neural network models.

Subject(s)

Gene Expression Profiling , Transcriptome , Neural Networks, Computer

A cost-aware framework for the development of AI models for healthcare applications.

Erion, Gabriel; Janizek, Joseph D; Hudelson, Carly; Utarnachitt, Richard B; McCoy, Andrew M; Sayre, Michael R; White, Nathan J; Lee, Su-In.

Nat Biomed Eng ; 6(12): 1384-1398, 2022 12.

Article in English | MEDLINE | ID: mdl-35393566

ABSTRACT

Accurate artificial intelligence (AI) for disease diagnosis could lower healthcare workloads. However, when time or financial resources for gathering input data are limited, as in emergency and critical-care medicine, developing accurate AI models, which typically require inputs for many clinical variables, may be impractical. Here we report a model-agnostic cost-aware AI (CoAI) framework for the development of predictive models that optimize the trade-off between prediction performance and feature cost. By using three datasets, each including thousands of patients, we show that relative to clinical risk scores, CoAI substantially reduces the cost and improves the accuracy of predicting acute traumatic coagulopathy in a pre-hospital setting, mortality in intensive-care patients and mortality in outpatient settings. We also show that CoAI outperforms state-of-the-art cost-aware prediction strategies in terms of predictive performance, model cost, training time and robustness to feature-cost perturbations. CoAI uses axiomatic feature-attribution methods for the estimation of feature importance and decouples feature selection from model training, thus allowing for a faster and more flexible adaptation of AI models to new feature costs and prediction budgets.

Subject(s)

Artificial Intelligence , Humans , Risk Factors

Adversarial deconfounding autoencoder for learning robust gene expression embeddings.

Dincer, Ayse B; Janizek, Joseph D; Lee, Su-In.

Bioinformatics ; 36(Suppl_2): i573-i582, 2020 12 30.

Article in English | MEDLINE | ID: mdl-33381842

ABSTRACT

MOTIVATION: Increasing number of gene expression profiles has enabled the use of complex models, such as deep unsupervised neural networks, to extract a latent space from these profiles. However, expression profiles, especially when collected in large numbers, inherently contain variations introduced by technical artifacts (e.g. batch effects) and uninteresting biological variables (e.g. age) in addition to the true signals of interest. These sources of variations, called confounders, produce embeddings that fail to transfer to different domains, i.e. an embedding learned from one dataset with a specific confounder distribution does not generalize to different distributions. To remedy this problem, we attempt to disentangle confounders from true signals to generate biologically informative embeddings. RESULTS: In this article, we introduce the Adversarial Deconfounding AutoEncoder (AD-AE) approach to deconfounding gene expression latent spaces. The AD-AE model consists of two neural networks: (i) an autoencoder to generate an embedding that can reconstruct original measurements, and (ii) an adversary trained to predict the confounder from that embedding. We jointly train the networks to generate embeddings that can encode as much information as possible without encoding any confounding signal. By applying AD-AE to two distinct gene expression datasets, we show that our model can (i) generate embeddings that do not encode confounder information, (ii) conserve the biological signals present in the original space and (iii) generalize successfully across different confounder domains. We demonstrate that AD-AE outperforms standard autoencoder and other deconfounding approaches. AVAILABILITY AND IMPLEMENTATION: Our code and data are available at https://gitlab.cs.washington.edu/abdincer/ad-ae. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Neural Networks, Computer , Gene Expression

AI for radiographic COVID-19 detection selects shortcuts over signal.

DeGrave, Alex J; Janizek, Joseph D; Lee, Su-In.

medRxiv ; 2020 Oct 07.

Article in English | MEDLINE | ID: mdl-32995822

ABSTRACT

Artificial intelligence (AI) researchers and radiologists have recently reported AI systems that accurately detect COVID-19 in chest radiographs. However, the robustness of these systems remains unclear. Using state-of-the-art techniques in explainable AI, we demonstrate that recent deep learning systems to detect COVID-19 from chest radiographs rely on confounding factors rather than medical pathology, creating an alarming situation in which the systems appear accurate, but fail when tested in new hospitals. We observe that the approach to obtain training data for these AI systems introduces a nearly ideal scenario for AI to learn these spurious "shortcuts." Because this approach to data collection has also been used to obtain training data for detection of COVID-19 in computed tomography scans and for medical imaging tasks related to other diseases, our study reveals a far-reaching problem in medical imaging AI. In addition, we show that evaluation of a model on external data is insufficient to ensure AI systems rely on medically relevant pathology, since the undesired "shortcuts" learned by AI systems may not impair performance in new hospitals. These findings demonstrate that explainable AI should be seen as a prerequisite to clinical deployment of ML healthcare models.

10.

Accurate classification of BRCA1 variants with saturation genome editing.

Findlay, Gregory M; Daza, Riza M; Martin, Beth; Zhang, Melissa D; Leith, Anh P; Gasperini, Molly; Janizek, Joseph D; Huang, Xingfan; Starita, Lea M; Shendure, Jay.

Nature ; 562(7726): 217-222, 2018 10.

Article in English | MEDLINE | ID: mdl-30209399

ABSTRACT

Variants of uncertain significance fundamentally limit the clinical utility of genetic information. The challenge they pose is epitomized by BRCA1, a tumour suppressor gene in which germline loss-of-function variants predispose women to breast and ovarian cancer. Although BRCA1 has been sequenced in millions of women, the risk associated with most newly observed variants cannot be definitively assigned. Here we use saturation genome editing to assay 96.5% of all possible single-nucleotide variants (SNVs) in 13 exons that encode functionally critical domains of BRCA1. Functional effects for nearly 4,000 SNVs are bimodally distributed and almost perfectly concordant with established assessments of pathogenicity. Over 400 non-functional missense SNVs are identified, as well as around 300 SNVs that disrupt expression. We predict that these results will be immediately useful for the clinical interpretation of BRCA1 variants, and that this approach can be extended to overcome the challenge of variants of uncertain significance in additional clinically actionable genes.

Subject(s)

BRCA1 Protein/genetics , Gene Editing , Genetic Predisposition to Disease/classification , Genetic Variation/genetics , Genome, Human/genetics , Hereditary Breast and Ovarian Cancer Syndrome/genetics , Cell Line , Exons/genetics , Female , Genes, Essential/genetics , Humans , Loss of Function Mutation/genetics , Models, Molecular , Prognosis , RNA, Messenger/genetics , RNA, Messenger/metabolism , Recombinational DNA Repair/genetics

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL