Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 36
Filter
Add more filters










Publication year range
1.
bioRxiv ; 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38979226

ABSTRACT

Cancer cells show remarkable plasticity and can switch lineages in response to the tumor microenvironment. Cellular plasticity drives invasiveness and metastasis and helps cancer cells to evade therapy by developing resistance to radiation and cytotoxic chemotherapy. Increased understanding of cell fate determination through epigenetic reprogramming is critical to discover how cancer cells achieve transcriptomic and phenotypic plasticity. Glioblastoma is a perfect example of cancer evolution where cells retain an inherent level of plasticity through activation or maintenance of progenitor developmental programs. However, the principles governing epigenetic drivers of cellular plasticity in glioblastoma remain poorly understood. Here, using machine learning (ML) we employ cross-patient prediction of transcript expression using a combination of epigenetic features (ATAC-seq, CTCF ChIP-seq, RNAPII ChIP-seq, H3K27Ac ChIP-seq, and RNA-seq) of glioblastoma stem cells (GSCs). We investigate different ML and deep learning (DL) models for this task and build our final pipeline using XGBoost. The model trained on one patient generalizes to another one suggesting that the epigenetic signals governing gene transcription are consistent across patients even if GSCs can be very different. We demonstrate that H3K27Ac is the epigenetic feature providing the most significant contribution to cross-patient prediction of gene expression. In addition, using H3K27Ac signals from patients-derived GSCs, we can predict gene expression of human neural crest stem cells suggesting a shared developmental epigenetic trajectory between subpopulations of these malignant and benign stem cells. Our cross-patient ML/DL models determine weighted patterns of influence of epigenetic marks on gene expression across patients with glioblastoma and between GSCs and neural crest stem cells. We propose that broader application of this analysis could reshape our view of glioblastoma tumor evolution and inform the design of new epigenetic targeting therapies.

2.
Bioinformatics ; 40(Supplement_1): i490-i500, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38940151

ABSTRACT

SUMMARY: Single-cell Hi-C (scHi-C) protocol helps identify cell-type-specific chromatin interactions and sheds light on cell differentiation and disease progression. Despite providing crucial insights, scHi-C data is often underutilized due to the high cost and the complexity of the experimental protocol. We present a deep learning framework, scGrapHiC, that predicts pseudo-bulk scHi-C contact maps using pseudo-bulk scRNA-seq data. Specifically, scGrapHiC performs graph deconvolution to extract genome-wide single-cell interactions from a bulk Hi-C contact map using scRNA-seq as a guiding signal. Our evaluations show that scGrapHiC, trained on seven cell-type co-assay datasets, outperforms typical sequence encoder approaches. For example, scGrapHiC achieves a substantial improvement of 23.2% in recovering cell-type-specific Topologically Associating Domains over the baselines. It also generalizes to unseen embryo and brain tissue samples. scGrapHiC is a novel method to generate cell-type-specific scHi-C contact maps using widely available genomic signals that enables the study of cell-type-specific chromatin interactions. AVAILABILITY AND IMPLEMENTATION: The GitHub link: https://github.com/rsinghlab/scGrapHiC contains the source code of scGrapHiC and associated scripts to preprocess publicly available datasets to produce the results and visualizations we have discuss in this manuscript.


Subject(s)
Chromatin , Deep Learning , Single-Cell Analysis , Single-Cell Analysis/methods , Chromatin/metabolism , Chromatin/chemistry , Humans
3.
JMIR Med Inform ; 12: e50209, 2024 Jun 19.
Article in English | MEDLINE | ID: mdl-38896468

ABSTRACT

BACKGROUND: Diagnostic errors pose significant health risks and contribute to patient mortality. With the growing accessibility of electronic health records, machine learning models offer a promising avenue for enhancing diagnosis quality. Current research has primarily focused on a limited set of diseases with ample training data, neglecting diagnostic scenarios with limited data availability. OBJECTIVE: This study aims to develop an information retrieval (IR)-based framework that accommodates data sparsity to facilitate broader diagnostic decision support. METHODS: We introduced an IR-based diagnostic decision support framework called CliniqIR. It uses clinical text records, the Unified Medical Language System Metathesaurus, and 33 million PubMed abstracts to classify a broad spectrum of diagnoses independent of training data availability. CliniqIR is designed to be compatible with any IR framework. Therefore, we implemented it using both dense and sparse retrieval approaches. We compared CliniqIR's performance to that of pretrained clinical transformer models such as Clinical Bidirectional Encoder Representations from Transformers (ClinicalBERT) in supervised and zero-shot settings. Subsequently, we combined the strength of supervised fine-tuned ClinicalBERT and CliniqIR to build an ensemble framework that delivers state-of-the-art diagnostic predictions. RESULTS: On a complex diagnosis data set (DC3) without any training data, CliniqIR models returned the correct diagnosis within their top 3 predictions. On the Medical Information Mart for Intensive Care III data set, CliniqIR models surpassed ClinicalBERT in predicting diagnoses with <5 training samples by an average difference in mean reciprocal rank of 0.10. In a zero-shot setting where models received no disease-specific training, CliniqIR still outperformed the pretrained transformer models with a greater mean reciprocal rank of at least 0.10. Furthermore, in most conditions, our ensemble framework surpassed the performance of its individual components, demonstrating its enhanced ability to make precise diagnostic predictions. CONCLUSIONS: Our experiments highlight the importance of IR in leveraging unstructured knowledge resources to identify infrequently encountered diagnoses. In addition, our ensemble framework benefits from combining the complementary strengths of the supervised and retrieval-based models to diagnose a broad spectrum of diseases.

4.
iScience ; 27(5): 109570, 2024 May 17.
Article in English | MEDLINE | ID: mdl-38646172

ABSTRACT

The three-dimensional organization of genomes plays a crucial role in essential biological processes. The segregation of chromatin into A and B compartments highlights regions of activity and inactivity, providing a window into the genomic activities specific to each cell type. Yet, the steep costs associated with acquiring Hi-C data, necessary for studying this compartmentalization across various cell types, pose a significant barrier in studying cell type specific genome organization. To address this, we present a prediction tool called compartment prediction using recurrent neural networks (CoRNN), which predicts compartmentalization of 3D genome using histone modification enrichment. CoRNN demonstrates robust cross-cell-type prediction of A/B compartments with an average AuROC of 90.9%. Cell-type-specific predictions align well with known functional elements, with H3K27ac and H3K36me3 identified as highly predictive histone marks. We further investigate our mispredictions and found that they are located in regions with ambiguous compartmental status. Furthermore, our model's generalizability is validated by predicting compartments in independent tissue samples, which underscores its broad applicability.

5.
JMIR Med Educ ; 10: e51391, 2024 Feb 13.
Article in English | MEDLINE | ID: mdl-38349725

ABSTRACT

BACKGROUND: Patients with rare and complex diseases often experience delayed diagnoses and misdiagnoses because comprehensive knowledge about these diseases is limited to only a few medical experts. In this context, large language models (LLMs) have emerged as powerful knowledge aggregation tools with applications in clinical decision support and education domains. OBJECTIVE: This study aims to explore the potential of 3 popular LLMs, namely Bard (Google LLC), ChatGPT-3.5 (OpenAI), and GPT-4 (OpenAI), in medical education to enhance the diagnosis of rare and complex diseases while investigating the impact of prompt engineering on their performance. METHODS: We conducted experiments on publicly available complex and rare cases to achieve these objectives. We implemented various prompt strategies to evaluate the performance of these models using both open-ended and multiple-choice prompts. In addition, we used a majority voting strategy to leverage diverse reasoning paths within language models, aiming to enhance their reliability. Furthermore, we compared their performance with the performance of human respondents and MedAlpaca, a generative LLM specifically designed for medical tasks. RESULTS: Notably, all LLMs outperformed the average human consensus and MedAlpaca, with a minimum margin of 5% and 13%, respectively, across all 30 cases from the diagnostic case challenge collection. On the frequently misdiagnosed cases category, Bard tied with MedAlpaca but surpassed the human average consensus by 14%, whereas GPT-4 and ChatGPT-3.5 outperformed MedAlpaca and the human respondents on the moderately often misdiagnosed cases category with minimum accuracy scores of 28% and 11%, respectively. The majority voting strategy, particularly with GPT-4, demonstrated the highest overall score across all cases from the diagnostic complex case collection, surpassing that of other LLMs. On the Medical Information Mart for Intensive Care-III data sets, Bard and GPT-4 achieved the highest diagnostic accuracy scores, with multiple-choice prompts scoring 93%, whereas ChatGPT-3.5 and MedAlpaca scored 73% and 47%, respectively. Furthermore, our results demonstrate that there is no one-size-fits-all prompting approach for improving the performance of LLMs and that a single strategy does not universally apply to all LLMs. CONCLUSIONS: Our findings shed light on the diagnostic capabilities of LLMs and the challenges associated with identifying an optimal prompting strategy that aligns with each language model's characteristics and specific task requirements. The significance of prompt engineering is highlighted, providing valuable insights for researchers and practitioners who use these language models for medical training. Furthermore, this study represents a crucial step toward understanding how LLMs can enhance diagnostic reasoning in rare and complex medical cases, paving the way for developing effective educational tools and accurate diagnostic aids to improve patient care and outcomes.


Subject(s)
Learning , Problem Solving , Humans , Reproducibility of Results , Educational Status , Language
6.
Cell Rep ; 42(12): 113500, 2023 12 26.
Article in English | MEDLINE | ID: mdl-38032797

ABSTRACT

Aging is a major risk factor for many diseases. Accurate methods for predicting age in specific cell types are essential to understand the heterogeneity of aging and to assess rejuvenation strategies. However, classifying organismal age at single-cell resolution using transcriptomics is challenging due to sparsity and noise. Here, we developed CellBiAge, a robust and easy-to-implement machine learning pipeline, to classify the age of single cells in the mouse brain using single-cell transcriptomics. We show that binarization of gene expression values for the top highly variable genes significantly improved test performance across different models, techniques, sexes, and brain regions, with potential age-related genes identified for model prediction. Additionally, we demonstrate CellBiAge's ability to capture exercise-induced rejuvenation in neural stem cells. This study provides a broadly applicable approach for robust classification of organismal age of single cells in the mouse brain, which may aid in understanding the aging process and evaluating rejuvenation methods.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Animals , Mice , Single-Cell Analysis/methods , Machine Learning , Cellular Senescence , Aging
7.
Microscopy (Oxf) ; 2023 Oct 21.
Article in English | MEDLINE | ID: mdl-37864808

ABSTRACT

We present a graph neural network (GNN)-based framework applied to large-scale microscopy image segmentation tasks. While deep learning models, like convolutional neural networks (CNNs), have become common for automating image segmentation tasks, they are limited by the image size that can fit in the memory of computational hardware. In a GNN framework, large-scale images are converted into graphs using superpixels (regions of pixels with similar color/intensity values), allowing us to input information from the entire image into the model. By converting images with hundreds of millions of pixels to graphs with thousands of nodes, we can segment large images using memory-limited computational resources. We compare the performance of GNN- and CNN-based segmentation in terms of accuracy, training time and required graphics processing unit memory. Based on our experiments with microscopy images of biological cells and cell colonies, GNN-based segmentation used one to three orders-of-magnitude fewer computational resources with only a change in accuracy of $-2\;%$ to $+0.3\;%$. Furthermore, errors due to superpixel generation can be reduced by either using better superpixel generation algorithms or increasing the number of superpixels, thereby allowing for improvement in the GNN framework's accuracy. This trade-off between accuracy and computational cost over CNN models makes the GNN framework attractive for many large-scale microscopy image segmentation tasks in biology.

8.
Cancer Res ; 83(12): 1984-1999, 2023 06 15.
Article in English | MEDLINE | ID: mdl-37101376

ABSTRACT

Chitinase 3-like 1 (Chi3l1) is a secreted protein that is highly expressed in glioblastoma. Here, we show that Chi3l1 alters the state of glioma stem cells (GSC) to support tumor growth. Exposure of patient-derived GSCs to Chi3l1 reduced the frequency of CD133+SOX2+ cells and increased the CD44+Chi3l1+ cells. Chi3l1 bound to CD44 and induced phosphorylation and nuclear translocation of ß-catenin, Akt, and STAT3. Single-cell RNA sequencing and RNA velocity following incubation of GSCs with Chi3l1 showed significant changes in GSC state dynamics driving GSCs towards a mesenchymal expression profile and reducing transition probabilities towards terminal cellular states. ATAC-seq revealed that Chi3l1 increases accessibility of promoters containing a Myc-associated zinc finger protein (MAZ) transcription factor footprint. Inhibition of MAZ downregulated a set of genes with high expression in cellular clusters that exhibit significant cell state transitions after treatment with Chi3l1, and MAZ deficiency rescued the Chi3L-induced increase of GSC self-renewal. Finally, targeting Chi3l1 in vivo with a blocking antibody inhibited tumor growth and increased the probability of survival. Overall, this work suggests that Chi3l1 interacts with CD44 on the surface of GSCs to induce Akt/ß-catenin signaling and MAZ transcriptional activity, which in turn upregulates CD44 expression in a pro-mesenchymal feed-forward loop. The role of Chi3l1 in regulating cellular plasticity confers a targetable vulnerability to glioblastoma. SIGNIFICANCE: Chi3l1 is a modulator of glioma stem cell states that can be targeted to promote differentiation and suppress growth of glioblastoma.


Subject(s)
Brain Neoplasms , Glioblastoma , Glioma , Humans , Glioblastoma/pathology , beta Catenin/metabolism , Proto-Oncogene Proteins c-akt/metabolism , Neoplastic Stem Cells/pathology , Glioma/metabolism , Brain Neoplasms/pathology , Cell Line, Tumor , Cell Proliferation
9.
bioRxiv ; 2023 Jan 25.
Article in English | MEDLINE | ID: mdl-36747724

ABSTRACT

With the rapid advance of single-cell RNA sequencing (scRNA-seq) technology, understanding biological processes at a more refined single-cell level is becoming possible. Gene co-expression estimation is an essential step in this direction. It can annotate functionalities of unknown genes or construct the basis of gene regulatory network inference. This study thoroughly tests the existing gene co-expression estimation methods on simulation datasets with known ground truth co-expression networks. We generate these novel datasets using two simulation processes that use the parameters learned from the experimental data. We demonstrate that these simulations better capture the underlying properties of the real-world single-cell datasets than previously tested simulations for the task. Our performance results on tens of simulated and eight experimental datasets show that all methods produce estimations with a high false discovery rate potentially caused by high-sparsity levels in the data. Finally, we find that commonly used pre-processing approaches, such as normalization and imputation, do not improve the co-expression estimation. Overall, our benchmark setup contributes to the co-expression estimator development, and our study provides valuable insights for the community of single-cell data analyses.

10.
Med Phys ; 50(8): 4943-4959, 2023 Aug.
Article in English | MEDLINE | ID: mdl-36847185

ABSTRACT

PURPOSE: State-of-the-art automated segmentation methods achieve exceptionally high performance on the Brain Tumor Segmentation (BraTS) challenge, a dataset of uniformly processed and standardized magnetic resonance generated images (MRIs) of gliomas. However, a reasonable concern is that these models may not fare well on clinical MRIs that do not belong to the specially curated BraTS dataset. Research using the previous generation of deep learning models indicates significant performance loss on cross-institutional predictions. Here, we evaluate the cross-institutional applicability and generalzsability of state-of-the-art deep learning models on new clinical data. METHODS: We train a state-of-the-art 3D U-Net model on the conventional BraTS dataset comprising low- and high-grade gliomas. We then evaluate the performance of this model for automatic tumor segmentation of brain tumors on in-house clinical data. This dataset contains MRIs of different tumor types, resolutions, and standardization than those found in the BraTS dataset. Ground truth segmentations to validate the automated segmentation for in-house clinical data were obtained from expert radiation oncologists. RESULTS: We report average Dice scores of 0.764, 0.648, and 0.61 for the whole tumor, tumor core, and enhancing tumor, respectively, in the clinical MRIs. These means are higher than numbers reported previously on same institution and cross-institution datasets of different origin using different methods. There is no statistically significant difference when comparing the dice scores to the inter-annotation variability between two expert clinical radiation oncologists. Although performance on the clinical data is lower than on the BraTS data, these numbers indicate that models trained on the BraTS dataset have impressive segmentation performance on previously unseen images obtained at a separate clinical institution. These images differ in the imaging resolutions, standardization pipelines, and tumor types from the BraTS data. CONCLUSIONS: State-of-the-art deep learning models demonstrate promising performance on cross-institutional predictions. They considerably improve on previous models and can transfer knowledge to new types of brain tumors without additional modeling.


Subject(s)
Brain Neoplasms , Glioma , Humans , Brain Neoplasms/diagnostic imaging , Glioma/diagnostic imaging , Health Facilities
11.
Genes (Basel) ; 15(1)2023 12 29.
Article in English | MEDLINE | ID: mdl-38254945

ABSTRACT

Hi-C is a widely used technique to study the 3D organization of the genome. Due to its high sequencing cost, most of the generated datasets are of a coarse resolution, which makes it impractical to study finer chromatin features such as Topologically Associating Domains (TADs) and chromatin loops. Multiple deep learning-based methods have recently been proposed to increase the resolution of these datasets by imputing Hi-C reads (typically called upscaling). However, the existing works evaluate these methods on either synthetically downsampled datasets, or a small subset of experimentally generated sparse Hi-C datasets, making it hard to establish their generalizability in the real-world use case. We present our framework-Hi-CY-that compares existing Hi-C resolution upscaling methods on seven experimentally generated low-resolution Hi-C datasets belonging to various levels of read sparsities originating from three cell lines on a comprehensive set of evaluation metrics. Hi-CY also includes four downstream analysis tasks, such as TAD and chromatin loops recall, to provide a thorough report on the generalizability of these methods. We observe that existing deep learning methods fail to generalize to experimentally generated sparse Hi-C datasets, showing a performance reduction of up to 57%. As a potential solution, we find that retraining deep learning-based methods with experimentally generated Hi-C datasets improves performance by up to 31%. More importantly, Hi-CY shows that even with retraining, the existing deep learning-based methods struggle to recover biological features such as chromatin loops and TADs when provided with sparse Hi-C datasets. Our study, through the Hi-CY framework, highlights the need for rigorous evaluation in the future. We identify specific avenues for improvements in the current deep learning-based Hi-C upscaling methods, including but not limited to using experimentally generated datasets for training.


Subject(s)
Deep Learning , Benchmarking , Cell Line , Chromatin/genetics
13.
J Comput Biol ; 29(11): 1213-1228, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36251763

ABSTRACT

Multiomic single-cell data allow us to perform integrated analysis to understand genomic regulation of biological processes. However, most single-cell sequencing assays are performed on separately sampled cell populations, as applying them to the same single-cell is challenging. Existing unsupervised single-cell alignment algorithms have been primarily benchmarked on coassay experiments. Our investigation revealed that these methods do not perform well for noncoassay single-cell experiments when there is disproportionate cell-type representation across measurement domains. Therefore, we extend our previous work-Single Cell alignment using Optimal Transport (SCOT)-by using unbalanced Gromov-Wasserstein optimal transport to handle disproportionate cell-type representation and differing sample sizes across single-cell measurements. Our method, SCOTv2, gives state-of-the-art alignment performance across five non-coassay data sets (simulated and real world). It can also integrate multiple (M≥2) single-cell measurements while preserving the self-tuning capabilities and computational tractability of its original version.


Subject(s)
Algorithms , Genomics
14.
J Am Med Inform Assoc ; 29(12): 2014-2022, 2022 11 14.
Article in English | MEDLINE | ID: mdl-36149257

ABSTRACT

OBJECTIVE: Alzheimer's disease (AD) is the most common neurodegenerative disorder with one of the most complex pathogeneses, making effective and clinically actionable decision support difficult. The objective of this study was to develop a novel multimodal deep learning framework to aid medical professionals in AD diagnosis. MATERIALS AND METHODS: We present a Multimodal Alzheimer's Disease Diagnosis framework (MADDi) to accurately detect the presence of AD and mild cognitive impairment (MCI) from imaging, genetic, and clinical data. MADDi is novel in that we use cross-modal attention, which captures interactions between modalities-a method not previously explored in this domain. We perform multi-class classification, a challenging task considering the strong similarities between MCI and AD. We compare with previous state-of-the-art models, evaluate the importance of attention, and examine the contribution of each modality to the model's performance. RESULTS: MADDi classifies MCI, AD, and controls with 96.88% accuracy on a held-out test set. When examining the contribution of different attention schemes, we found that the combination of cross-modal attention with self-attention performed the best, and no attention layers in the model performed the worst, with a 7.9% difference in F1-scores. DISCUSSION: Our experiments underlined the importance of structured clinical data to help machine learning models contextualize and interpret the remaining modalities. Extensive ablation studies showed that any multimodal mixture of input features without access to structured clinical information suffered marked performance losses. CONCLUSION: This study demonstrates the merit of combining multiple input modalities via cross-modal attention to deliver highly accurate AD diagnostic decision support.


Subject(s)
Alzheimer Disease , Cognitive Dysfunction , Deep Learning , Humans , Alzheimer Disease/diagnosis , Magnetic Resonance Imaging/methods , Cognitive Dysfunction/diagnosis , Machine Learning
15.
J Comput Biol ; 29(5): 409-424, 2022 05.
Article in English | MEDLINE | ID: mdl-35325548

ABSTRACT

Long-range regulatory interactions among genomic regions are critical for controlling gene expression, and their disruption has been associated with a host of diseases. However, when modeling the effects of regulatory factors, most deep learning models either neglect long-range interactions or fail to capture the inherent 3D structure of the underlying genomic organization. To address these limitations, we present a Graph Convolutional Model for Epigenetic Regulation of Gene Expression (GC-MERGE). Using a graph-based framework, the model incorporates important information about long-range interactions via a natural encoding of genomic spatial interactions into the graph representation. It integrates measurements of both the global genomic organization and the local regulatory factors, specifically histone modifications, to not only predict the expression of a given gene of interest but also quantify the importance of its regulatory factors. We apply GC-MERGE to data sets for three cell lines-GM12878 (lymphoblastoid), K562 (myelogenous leukemia), and HUVEC (human umbilical vein endothelial)-and demonstrate its state-of-the-art predictive performance. Crucially, we show that our model is interpretable in terms of the observed biological regulatory factors, highlighting both the histone modifications and the interacting genomic regions contributing to a gene's predicted expression. We provide model explanations for multiple exemplar genes and validate them with evidence from the literature. Our model presents a novel setup for predicting gene expression by integrating multimodal data sets in a graph convolutional framework. More importantly, it enables interpretation of the biological mechanisms driving the model's predictions.


Subject(s)
Epigenesis, Genetic , Neural Networks, Computer , Gene Expression , Genome , Genomics , Humans
16.
J Comput Biol ; 29(1): 3-18, 2022 01.
Article in English | MEDLINE | ID: mdl-35050714

ABSTRACT

Recent advances in sequencing technologies have allowed us to capture various aspects of the genome at single-cell resolution. However, with the exception of a few of co-assaying technologies, it is not possible to simultaneously apply different sequencing assays on the same single cell. In this scenario, computational integration of multi-omic measurements is crucial to enable joint analyses. This integration task is particularly challenging due to the lack of sample-wise or feature-wise correspondences. We present single-cell alignment with optimal transport (SCOT), an unsupervised algorithm that uses the Gromov-Wasserstein optimal transport to align single-cell multi-omics data sets. SCOT performs on par with the current state-of-the-art unsupervised alignment methods, is faster, and requires tuning of fewer hyperparameters. More importantly, SCOT uses a self-tuning heuristic to guide hyperparameter selection based on the Gromov-Wasserstein distance. Thus, in the fully unsupervised setting, SCOT aligns single-cell data sets better than the existing methods without requiring any orthogonal correspondence information.


Subject(s)
Algorithms , Genomics/statistics & numerical data , Sequence Alignment/statistics & numerical data , Single-Cell Analysis/statistics & numerical data , Computational Biology , Computer Simulation , Databases, Genetic/statistics & numerical data , Humans , Models, Statistical , Unsupervised Machine Learning
17.
NPJ Digit Med ; 5(1): 5, 2022 Jan 14.
Article in English | MEDLINE | ID: mdl-35031687

ABSTRACT

While COVID-19 diagnosis and prognosis artificial intelligence models exist, very few can be implemented for practical use given their high risk of bias. We aimed to develop a diagnosis model that addresses notable shortcomings of prior studies, integrating it into a fully automated triage pipeline that examines chest radiographs for the presence, severity, and progression of COVID-19 pneumonia. Scans were collected using the DICOM Image Analysis and Archive, a system that communicates with a hospital's image repository. The authors collected over 6,500 non-public chest X-rays comprising diverse COVID-19 severities, along with radiology reports and RT-PCR data. The authors provisioned one internally held-out and two external test sets to assess model generalizability and compare performance to traditional radiologist interpretation. The pipeline was evaluated on a prospective cohort of 80 radiographs, reporting a 95% diagnostic accuracy. The study mitigates bias in AI model development and demonstrates the value of an end-to-end COVID-19 triage platform.

18.
J Comput Biol ; 29(1): 19-22, 2022 01.
Article in English | MEDLINE | ID: mdl-34985990

ABSTRACT

Although the availability of various sequencing technologies allows us to capture different genome properties at single-cell resolution, with the exception of a few co-assaying technologies, applying different sequencing assays on the same single cell is impossible. Single-cell alignment using optimal transport (SCOT) is an unsupervised algorithm that addresses this limitation by using optimal transport to align single-cell multiomics data. First, it preserves the local geometry by constructing a k-nearest neighbor (k-NN) graph for each data set (or domain) to capture the intra-domain distances. SCOT then finds a probabilistic coupling matrix that minimizes the discrepancy between the intra-domain distance matrices. Finally, it uses the coupling matrix to project one single-cell data set onto another through barycentric projection, thus aligning them. SCOT requires tuning only two hyperparameters and is robust to the choice of one. Furthermore, the Gromov-Wasserstein distance in the algorithm can guide SCOT's hyperparameter tuning in a fully unsupervised setting when no orthogonal alignment information is available. Thus, SCOT is a fast and accurate alignment method that provides a heuristic for hyperparameter selection in a real-world unsupervised single-cell data alignment scenario. We provide a tutorial for SCOT and make its source code publicly available on GitHub.


Subject(s)
Algorithms , Sequence Alignment/statistics & numerical data , Single-Cell Analysis/statistics & numerical data , Computational Biology , Databases, Genetic/statistics & numerical data , Genomics/statistics & numerical data , Heuristics , Humans , Neural Networks, Computer , Sequence Analysis/statistics & numerical data , Software , Unsupervised Machine Learning
19.
Genome Biol ; 22(1): 279, 2021 09 27.
Article in English | MEDLINE | ID: mdl-34579774

ABSTRACT

BACKGROUND: Mammalian development is associated with extensive changes in gene expression, chromatin accessibility, and nuclear structure. Here, we follow such changes associated with mouse embryonic stem cell differentiation and X inactivation by integrating, for the first time, allele-specific data from these three modalities obtained by high-throughput single-cell RNA-seq, ATAC-seq, and Hi-C. RESULTS: Allele-specific contact decay profiles obtained by single-cell Hi-C clearly show that the inactive X chromosome has a unique profile in differentiated cells that have undergone X inactivation. Loss of this inactive X-specific structure at mitosis is followed by its reappearance during the cell cycle, suggesting a "bookmark" mechanism. Differentiation of embryonic stem cells to follow the onset of X inactivation is associated with changes in contact decay profiles that occur in parallel on both the X chromosomes and autosomes. Single-cell RNA-seq and ATAC-seq show evidence of a delay in female versus male cells, due to the presence of two active X chromosomes at early stages of differentiation. The onset of the inactive X-specific structure in single cells occurs later than gene silencing, consistent with the idea that chromatin compaction is a late event of X inactivation. Single-cell Hi-C highlights evidence of discrete changes in nuclear structure characterized by the acquisition of very long-range contacts throughout the nucleus. Novel computational approaches allow for the effective alignment of single-cell gene expression, chromatin accessibility, and 3D chromosome structure. CONCLUSIONS: Based on trajectory analyses, three distinct nuclear structure states are detected reflecting discrete and profound simultaneous changes not only to the structure of the X chromosomes, but also to that of autosomes during differentiation. Our study reveals that long-range structural changes to chromosomes appear as discrete events, unlike progressive changes in gene expression and chromatin accessibility.


Subject(s)
Cell Differentiation/genetics , Gene Expression , Mouse Embryonic Stem Cells/metabolism , X Chromosome Inactivation , Alleles , Animals , Cell Cycle , Cell Line , Cell Nucleus/genetics , Female , Genome , Male , Mice , RNA-Seq , Single-Cell Analysis , X Chromosome/chemistry
20.
Nucleic Acids Res ; 49(W1): W641-W653, 2021 07 02.
Article in English | MEDLINE | ID: mdl-34125906

ABSTRACT

Uncovering how transcription factors regulate their targets at DNA, RNA and protein levels over time is critical to define gene regulatory networks (GRNs) and assign mechanisms in normal and diseased states. RNA-seq is a standard method measuring gene regulation using an established set of analysis stages. However, none of the currently available pipeline methods for interpreting ordered genomic data (in time or space) use time-series models to assign cause and effect relationships within GRNs, are adaptive to diverse experimental designs, or enable user interpretation through a web-based platform. Furthermore, methods integrating ordered RNA-seq data with protein-DNA binding data to distinguish direct from indirect interactions are urgently needed. We present TIMEOR (Trajectory Inference and Mechanism Exploration with Omics data in R), the first web-based and adaptive time-series multi-omics pipeline method which infers the relationship between gene regulatory events across time. TIMEOR addresses the critical need for methods to determine causal regulatory mechanism networks by leveraging time-series RNA-seq, motif analysis, protein-DNA binding data, and protein-protein interaction networks. TIMEOR's user-catered approach helps non-coders generate new hypotheses and validate known mechanisms. We used TIMEOR to identify a novel link between insulin stimulation and the circadian rhythm cycle. TIMEOR is available at https://github.com/ashleymaeconard/TIMEOR.git and http://timeor.brown.edu.


Subject(s)
Gene Expression Regulation , Gene Regulatory Networks , RNA-Seq , Software , Circadian Rhythm/genetics , Genomics , Humans , Insulin/physiology , Internet , Protein Interaction Mapping , Transcription Factors/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...