Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
EBioMedicine ; 99: 104908, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38101298

RESUMO

BACKGROUND: Deep learning has revolutionized digital pathology, allowing automatic analysis of hematoxylin and eosin (H&E) stained whole slide images (WSIs) for diverse tasks. WSIs are broken into smaller images called tiles, and a neural network encodes each tile. Many recent works use supervised attention-based models to aggregate tile-level features into a slide-level representation, which is then used for downstream analysis. Training supervised attention-based models is computationally intensive, architecture optimization of the attention module is non-trivial, and labeled data are not always available. Therefore, we developed an unsupervised and fast approach called SAMPLER to generate slide-level representations. METHODS: Slide-level representations of SAMPLER are generated by encoding the cumulative distribution functions of multiscale tile-level features. To assess effectiveness of SAMPLER, slide-level representations of breast carcinoma (BRCA), non-small cell lung carcinoma (NSCLC), and renal cell carcinoma (RCC) WSIs of The Cancer Genome Atlas (TCGA) were used to train separate classifiers distinguishing tumor subtypes in FFPE and frozen WSIs. In addition, BRCA and NSCLC classifiers were externally validated on frozen WSIs. Moreover, SAMPLER's attention maps identify regions of interest, which were evaluated by a pathologist. To determine time efficiency of SAMPLER, we compared runtime of SAMPLER with two attention-based models. SAMPLER concepts were used to improve the design of a context-aware multi-head attention model (context-MHA). FINDINGS: SAMPLER-based classifiers were comparable to state-of-the-art attention deep learning models to distinguish subtypes of BRCA (AUC = 0.911 ± 0.029), NSCLC (AUC = 0.940 ± 0.018), and RCC (AUC = 0.987 ± 0.006) on FFPE WSIs (internal test sets). However, training SAMLER-based classifiers was >100 times faster. SAMPLER models successfully distinguished tumor subtypes on both internal and external test sets of frozen WSIs. Histopathological review confirmed that SAMPLER-identified high attention tiles contained subtype-specific morphological features. The improved context-MHA distinguished subtypes of BRCA and RCC (BRCA-AUC = 0.921 ± 0.027, RCC-AUC = 0.988 ± 0.010) with increased accuracy on internal test FFPE WSIs. INTERPRETATION: Our unsupervised statistical approach is fast and effective for analyzing WSIs, with greatly improved scalability over attention-based deep learning methods. The high accuracy of SAMPLER-based classifiers and interpretable attention maps suggest that SAMPLER successfully encodes the distinct morphologies within WSIs and will be applicable to general histology image analysis problems. FUNDING: This study was supported by the National Cancer Institute (Grant No. R01CA230031 and P30CA034196).


Assuntos
Neoplasias da Mama , Carcinoma Pulmonar de Células não Pequenas , Carcinoma de Células Renais , Neoplasias Renais , Neoplasias Pulmonares , Humanos , Feminino
2.
bioRxiv ; 2023 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-37577691

RESUMO

Deep learning has revolutionized digital pathology, allowing for automatic analysis of hematoxylin and eosin (H&E) stained whole slide images (WSIs) for diverse tasks. In such analyses, WSIs are typically broken into smaller images called tiles, and a neural network backbone encodes each tile in a feature space. Many recent works have applied attention based deep learning models to aggregate tile-level features into a slide-level representation, which is then used for slide-level prediction tasks. However, training attention models is computationally intensive, necessitating hyperparameter optimization and specialized training procedures. Here, we propose SAMPLER, a fully statistical approach to generate efficient and informative WSI representations by encoding the empirical cumulative distribution functions (CDFs) of multiscale tile features. We demonstrate that SAMPLER-based classifiers are as accurate or better than state-of-the-art fully deep learning attention models for classification tasks including distinction of: subtypes of breast carcinoma (BRCA: AUC=0.911 ± 0.029); subtypes of non-small cell lung carcinoma (NSCLC: AUC=0.940±0.018); and subtypes of renal cell carcinoma (RCC: AUC=0.987±0.006). A major advantage of the SAMPLER representation is that predictive models are >100X faster compared to attention models. Histopathological review confirms that SAMPLER-identified high attention tiles contain tumor morphological features specific to the tumor type, while low attention tiles contain fibrous stroma, blood, or tissue folding artifacts. We further apply SAMPLER concepts to improve the design of attention-based neural networks, yielding a context aware multi-head attention model with increased accuracy for subtype classification within BRCA and RCC (BRCA: AUC=0.921±0.027, and RCC: AUC=0.988±0.010). Finally, we provide theoretical results identifying sufficient conditions for which SAMPLER is optimal. SAMPLER is a fast and effective approach for analyzing WSIs, with greatly improved scalability over attention methods to benefit digital pathology analysis.

3.
EBioMedicine ; 94: 104726, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37499603

RESUMO

BACKGROUND: Colorectal cancers are the fourth most diagnosed cancer and the second leading cancer in number of deaths. Many clinical variables, pathological features, and genomic signatures are associated with patient risk, but reliable patient stratification in the clinic remains a challenging task. Here we assess how image, clinical, and genomic features can be combined to predict risk. METHODS: We developed and evaluated integrative deep learning models combining formalin-fixed, paraffin-embedded (FFPE) whole slide images (WSIs), clinical variables, and mutation signatures to stratify colon adenocarcinoma (COAD) patients based on their risk of mortality. Our models were trained using a dataset of 108 patients from The Cancer Genome Atlas (TCGA), and were externally validated on newly generated dataset from Wayne State University (WSU) of 123 COAD patients and rectal adenocarcinoma (READ) patients in TCGA (N = 52). FINDINGS: We first observe that deep learning models trained on FFPE WSIs of TCGA-COAD separate high-risk (OS < 3 years, N = 38) and low-risk (OS > 5 years, N = 25) patients (AUC = 0.81 ± 0.08, 5 year survival p < 0.0001, 5 year relative risk = 1.83 ± 0.04) though such models are less effective at predicting overall survival (OS) for moderate-risk (3 years < OS < 5 years, N = 45) patients (5 year survival p-value = 0.5, 5 year relative risk = 1.05 ± 0.09). We find that our integrative models combining WSIs, clinical variables, and mutation signatures can improve patient stratification for moderate-risk patients (5 year survival p < 0.0001, 5 year relative risk = 1.87 ± 0.07). Our integrative model combining image and clinical variables is also effective on an independent pathology dataset (WSU-COAD, N = 123) generated by our team (5 year survival p < 0.0001, 5 year relative risk = 1.52 ± 0.08), and the TCGA-READ data (5 year survival p < 0.0001, 5 year relative risk = 1.18 ± 0.17). Our multicenter integrative image and clinical model trained on combined TCGA-COAD and WSU-COAD is effective in predicting risk on TCGA-READ (5 year survival p < 0.0001, 5 year relative risk = 1.82 ± 0.13) data. Pathologist review of image-based heatmaps suggests that nuclear size pleomorphism, intense cellularity, and abnormal structures are associated with high-risk, while low-risk regions have more regular and small cells. Quantitative analysis shows high cellularity, high ratios of tumor cells, large tumor nuclei, and low immune infiltration are indicators of high-risk tiles. INTERPRETATION: The improved stratification of colorectal cancer patients from our computational methods can be beneficial for treatment plans and enrollment of patients in clinical trials. FUNDING: This study was supported by the National Cancer Institutes (Grant No. R01CA230031 and P30CA034196). The funders had no roles in study design, data collection and analysis or preparation of the manuscript.


Assuntos
Adenocarcinoma , Neoplasias do Colo , Aprendizado Profundo , Humanos , Neoplasias do Colo/diagnóstico , Neoplasias do Colo/genética , Adenocarcinoma/genética , Núcleo Celular , Genômica
4.
J Surg Oncol ; 127(3): 426-433, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36251352

RESUMO

BACKGROUND AND OBJECTIVES: Deep learning utilizing convolutional neural networks (CNNs) applied to hematoxylin & eosin (H&E)-stained slides numerically encodes histomorphological tumor features. Tumor heterogeneity is an emerging biomarker in colon cancer that is, captured by these features, whereas microsatellite instability (MSI) is an established biomarker traditionally assessed by immunohistochemistry or polymerase chain reaction. METHODS: H&E-stained slides from The Cancer Genome Atlas (TCGA) colon cohort are passed through the CNN. Resulting imaging features are used to cluster morphologically similar slide regions. Tile-level pairwise similarities are calculated and used to generate a tumor heterogeneity score (THS). Patient-level THS is then correlated with TCGA-reported biomarkers, including MSI-status. RESULTS: H&E-stained images from 313 patients generated 534 771 tiles. Deep learning automatically identified and annotated cells by type and clustered morphologically similar slide regions. MSI-high tumors demonstrated significantly higher THS than MSS/MSI-low (p < 0.001). THS was higher in MLH1-silent versus non-silent tumors (p < 0.001). The sequencing derived MSIsensor score also correlated with THS (r = 0.51, p < 0.0001). CONCLUSIONS: Deep learning provides spatially resolved visualization of imaging-derived biomarkers and automated quantification of tumor heterogeneity. Our novel THS correlates with MSI-status, indicating that with expanded training sets, translational tools could be developed that predict MSI-status using H&E-stained images alone.


Assuntos
Neoplasias do Colo , Neoplasias Colorretais , Aprendizado Profundo , Humanos , Instabilidade de Microssatélites , Repetições de Microssatélites , Neoplasias do Colo/diagnóstico por imagem , Neoplasias do Colo/genética , Neoplasias do Colo/patologia , Neoplasias Colorretais/patologia
5.
Sci Rep ; 12(1): 9428, 2022 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-35676395

RESUMO

Convolutional neural networks (CNNs) are revolutionizing digital pathology by enabling machine learning-based classification of a variety of phenotypes from hematoxylin and eosin (H&E) whole slide images (WSIs), but the interpretation of CNNs remains difficult. Most studies have considered interpretability in a post hoc fashion, e.g. by presenting example regions with strongly predicted class labels. However, such an approach does not explain the biological features that contribute to correct predictions. To address this problem, here we investigate the interpretability of H&E-derived CNN features (the feature weights in the final layer of a transfer-learning-based architecture). While many studies have incorporated CNN features into predictive models, there has been little empirical study of their properties. We show such features can be construed as abstract morphological genes ("mones") with strong independent associations to biological phenotypes. Many mones are specific to individual cancer types, while others are found in multiple cancers especially from related tissue types. We also observe that mone-mone correlations are strong and robustly preserved across related cancers. Importantly, linear mone-based classifiers can very accurately separate 38 distinct classes (19 tumor types and their adjacent normals, AUC = [Formula: see text] for each class prediction), and linear classifiers are also highly effective for universal tumor detection (AUC = [Formula: see text]). This linearity provides evidence that individual mones or correlated mone clusters may be associated with interpretable histopathological features or other patient characteristics. In particular, the statistical similarity of mones to gene expression values allows integrative mone analysis via expression-based bioinformatics approaches. We observe strong correlations between individual mones and individual gene expression values, notably mones associated with collagen gene expression in ovarian cancer. Mone-expression comparisons also indicate that immunoglobulin expression can be identified using mones in colon adenocarcinoma and that immune activity can be identified across multiple cancer types, and we verify these findings by expert histopathological review. Our work demonstrates that mones provide a morphological H&E decomposition that can be effectively associated with diverse phenotypes, analogous to the interpretability of transcription via gene expression values. Our work also demonstrates mones can be interpreted without using a classifier as a proxy.


Assuntos
Adenocarcinoma , Neoplasias do Colo , Aprendizado Profundo , Neoplasias do Colo/genética , Humanos , Aprendizado de Máquina , Redes Neurais de Computação
6.
Methods Mol Biol ; 2194: 77-105, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-32926363

RESUMO

Survival analysis is tremendously powerful, and is a popular methodology for analyzing time to event models in bioinformatics. Furthermore, several of its extensions can simultaneously perform variable selection in conjunction with model estimation. While this flexibility is extremely desirable, under certain scenarios, binary class variable selection and classification methods might provide more reliable risk estimates. Synthetic simulations and real data case studies suggest that when (1) randomly censored points comprise only a small portion of data, (2) biological markers are weak, (3) it is desired to compute risk across predetermined time intervals, and (4) the assumptions of the competing time to event models are violated, binary class models tend to perform superior. In practice, it might be prudent to test both model families to guarantee adequate analysis. Here we describe the pipeline of binary class feature selection and classification for time to event risk assessment.


Assuntos
Bioestatística/métodos , Biologia Computacional/métodos , Neoplasias/mortalidade , Algoritmos , Análise de Variância , Simulação por Computador , Interpretação Estatística de Dados , Análise Discriminante , Humanos , Modelos Lineares , Prognóstico , Medição de Risco/métodos , Máquina de Vetores de Suporte , Análise de Sobrevida
7.
Nat Commun ; 11(1): 6367, 2020 12 11.
Artigo em Inglês | MEDLINE | ID: mdl-33311458

RESUMO

Histopathological images are a rich but incompletely explored data type for studying cancer. Manual inspection is time consuming, making it challenging to use for image data mining. Here we show that convolutional neural networks (CNNs) can be systematically applied across cancer types, enabling comparisons to reveal shared spatial behaviors. We develop CNN architectures to analyze 27,815 hematoxylin and eosin scanned images from The Cancer Genome Atlas for tumor/normal, cancer subtype, and mutation classification. Our CNNs are able to classify TCGA pathologist-annotated tumor/normal status of whole slide images (WSIs) in 19 cancer types with consistently high AUCs (0.995 ± 0.008), as well as subtypes with lower but significant accuracy (AUC 0.87 ± 0.1). Remarkably, tumor/normal CNNs trained on one tissue are effective in others (AUC 0.88 ± 0.11), with classifier relationships also recapitulating known adenocarcinoma, carcinoma, and developmental biology. Moreover, classifier comparisons reveal intra-slide spatial similarities, with an average tile-level correlation of 0.45 ± 0.16 between classifier pairs. Breast cancers, bladder cancers, and uterine cancers have spatial patterns that are particularly easy to detect, suggesting these cancers can be canonical types for image analysis. Patterns for TP53 mutations can also be detected, with WSI self- and cross-tissue AUCs ranging from 0.65-0.80. Finally, we comparatively evaluate CNNs on 170 breast and colon cancer images with pathologist-annotated nuclei, finding that both cellular and intercellular regions contribute to CNN accuracy. These results demonstrate the power of CNNs not only for histopathological classification, but also for cross-comparisons to reveal conserved spatial behaviors across tumors.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Neoplasias/diagnóstico por imagem , Neoplasias/patologia , Comportamento Espacial , Área Sob a Curva , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Neoplasias do Colo/diagnóstico por imagem , Neoplasias do Colo/genética , Neoplasias do Colo/patologia , Feminino , Genes p53 , Genótipo , Humanos , Processamento de Imagem Assistida por Computador/métodos , Mutação , Neoplasias/genética
8.
BMC Bioinformatics ; 21(1): 156, 2020 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-32334509

RESUMO

BACKGROUND: Binary classification rules based on a small-sample of high-dimensional data (for instance, gene expression data) are ubiquitous in modern bioinformatics. Constructing such classifiers is challenging due to (a) the complex nature of underlying biological traits, such as gene interactions, and (b) the need for highly interpretable glass-box models. We use the theory of high dimensional model representation (HDMR) to build interpretable low dimensional approximations of the log-likelihood ratio accounting for the effects of each individual gene as well as gene-gene interactions. We propose two algorithms approximating the second order HDMR expansion, and a hypothesis test based on the HDMR formulation to identify significantly dysregulated pairwise interactions. The theory is seen as flexible and requiring only a mild set of assumptions. RESULTS: We apply our approach to gene expression data from both synthetic and real (breast and lung cancer) datasets comparing it also against several popular state-of-the-art methods. The analyses suggest the proposed algorithms can be used to obtain interpretable prediction rules with high prediction accuracies and to successfully extract significantly dysregulated gene-gene interactions from the data. They also compare favorably against their competitors across multiple synthetic data scenarios. CONCLUSION: The proposed HDMR-based approach appears to produce a reliable classifier that additionally allows one to describe how individual genes or gene-gene interactions affect classification decisions. Both real and synthetic data analyses suggest that our methods can be used to identify gene networks with dysregulated pairwise interactions, and are therefore appropriate for differential networks analysis.


Assuntos
Modelos Teóricos , Algoritmos , Área Sob a Curva , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Bases de Dados Genéticas , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/patologia , Curva ROC
9.
Artigo em Inglês | MEDLINE | ID: mdl-30040658

RESUMO

Optimal Bayesian feature filtering (OBF) is a fast and memory-efficient algorithm that optimally identifies markers with distributional differences between treatment groups under Gaussian models. Here, we study the performance and robustness of OBF for biomarker discovery. Our contributions are twofold: (1) we examine how OBF performs on data that violates modeling assumptions, and (2) we provide guidelines on how to set input parameters for robust performance. Contribution (1) addresses an important, relevant, and commonplace problem in computational biology, where it is often impossible to validate an algorithm's core assumptions. To accomplish both tasks, we present a battery of simulations that implement OBF with different inputs and challenge each assumption made by OBF. In particular, we examine the robustness of OBF with respect to incorrect input parameters, false independence, imbalanced sample size, and we address the Gaussianity assumption by considering performance on an extensive family of non-Gaussian distributions. We address advantages and disadvantages between different priors and optimization criteria throughout. Finally, we evaluate the utility of OBF in biomarker discovery using acute myeloid leukemia (AML) and colon cancer microarray datasets, and show that OBF is successful at identifying well-known biomarkers for these diseases that rank low under moderated t-test.


Assuntos
Teorema de Bayes , Biomarcadores , Biologia Computacional/métodos , Algoritmos , Bases de Dados Factuais , Humanos , Neoplasias/diagnóstico , Neoplasias/metabolismo
10.
BMC Bioinformatics ; 19(Suppl 3): 70, 2018 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-29589558

RESUMO

BACKGROUND: Many bioinformatics studies aim to identify markers, or features, that can be used to discriminate between distinct groups. In problems where strong individual markers are not available, or where interactions between gene products are of primary interest, it may be necessary to consider combinations of features as a marker family. To this end, recent work proposes a hierarchical Bayesian framework for feature selection that places a prior on the set of features we wish to select and on the label-conditioned feature distribution. While an analytical posterior under Gaussian models with block covariance structures is available, the optimal feature selection algorithm for this model remains intractable since it requires evaluating the posterior over the space of all possible covariance block structures and feature-block assignments. To address this computational barrier, in prior work we proposed a simple suboptimal algorithm, 2MNC-Robust, with robust performance across the space of block structures. Here, we present three new heuristic feature selection algorithms. RESULTS: The proposed algorithms outperform 2MNC-Robust and many other popular feature selection algorithms on synthetic data. In addition, enrichment analysis on real breast cancer, colon cancer, and Leukemia data indicates they also output many of the genes and pathways linked to the cancers under study. CONCLUSIONS: Bayesian feature selection is a promising framework for small-sample high-dimensional data, in particular biomarker discovery applications. When applied to cancer data these algorithms outputted many genes already shown to be involved in cancer as well as potentially new biomarkers. Furthermore, one of the proposed algorithms, SPM, outputs blocks of heavily correlated genes, particularly useful for studying gene interactions and gene networks.


Assuntos
Algoritmos , Heurística , Teorema de Bayes , Humanos , Neoplasias/genética , Distribuição Normal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...