Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 53
Filter
1.
Plants (Basel) ; 13(9)2024 Apr 30.
Article in English | MEDLINE | ID: mdl-38732474

ABSTRACT

Genomic selection (GS) is a marker-based selection method used to improve the genetic gain of quantitative traits in plant breeding. A large number of breeding datasets are available in the soybean database, and the application of these public datasets in GS will improve breeding efficiency and reduce time and cost. However, the most important problem to be solved is how to improve the ability of across-population prediction. The objectives of this study were to perform genomic prediction (GP) and estimate the prediction ability (PA) for seed oil and protein contents in soybean using available public datasets to predict breeding populations in current, ongoing breeding programs. In this study, six public datasets of USDA GRIN soybean germplasm accessions with available phenotypic data of seed oil and protein contents from different experimental populations and their genotypic data of single-nucleotide polymorphisms (SNPs) were used to perform GP and to predict a bi-parent-derived breeding population in our experiment. The average PA was 0.55 and 0.50 for seed oil and protein contents within the bi-parents population according to the within-population prediction; and 0.45 for oil and 0.39 for protein content when the six USDA populations were combined and employed as training sets to predict the bi-parent-derived population. The results showed that four USDA-cultivated populations can be used as a training set individually or combined to predict oil and protein contents in GS when using 800 or more USDA germplasm accessions as a training set. The smaller the genetic distance between training population and testing population, the higher the PA. The PA increased as the population size increased. In across-population prediction, no significant difference was observed in PA for oil and protein content among different models. The PA increased as the SNP number increased until a marker set consisted of 10,000 SNPs. This study provides reasonable suggestions and methods for breeders to utilize public datasets for GS. It will aid breeders in developing GS-assisted breeding strategies to develop elite soybean cultivars with high oil and protein contents.

2.
BMC Genom Data ; 25(1): 25, 2024 Mar 04.
Article in English | MEDLINE | ID: mdl-38438864

ABSTRACT

OBJECTIVES: Soybean is an important feed and oil crop in the world due to its high protein and oil content. China has a collection of more than 43,000 soybean germplasm resources, which provides a rich genetic diversity for soybean breeding. However, the rich genetic diversity poses great challenges to the genetic improvement of soybean. This study reports on the de novo genome assembly of HJ117, a soybean variety with high protein content of 52.99%. These data will prove to be valuable resources for further soybean quality improvement research, and will aid in the elucidation of regulatory mechanisms underlying soybean protein content. DATA DESCRIPTION: We generated a contiguous reference genome of 1041.94 Mb for HJ117 using a combination of Illumina short reads (23.38 Gb) and PacBio long reads (25.58 Gb), with high-quality sequence coverage of approximately 22.44× and 24.55×, respectively. HJ117 was developed through backcross breeding, using Jidou 12 as the recurrent parent and Chamoshidou as the donor parent. The assembly was further assisted by 114.5 Gb Hi-C data (109.9×), resulting in a contig N50 of 19.32 Mb and scaffold N50 of 51.43 Mb. Notably, Core Eukaryotic Genes Mapping Approach (CEGMA) assessment and Benchmarking Universal Single-Copy Orthologs (BUSCO) assessment results indicated that most core eukaryotic genes (97.18%) and genes in the BUSCO dataset (99.4%) were identified, and 96.44% of the genomic sequences were anchored onto twenty pseudochromosomes.


Subject(s)
Glycine max , Plant Breeding , Glycine max/genetics , Soybean Proteins/genetics , Benchmarking , China
3.
Patterns (N Y) ; 5(3): 100927, 2024 Mar 08.
Article in English | MEDLINE | ID: mdl-38487805

ABSTRACT

In this study, we introduce TESA (weighted two-stage alignment), an innovative motif prediction tool that refines the identification of DNA-binding protein motifs, essential for deciphering transcriptional regulatory mechanisms. Unlike traditional algorithms that rely solely on sequence data, TESA integrates the high-resolution chromatin immunoprecipitation (ChIP) signal, specifically from ChIP-exonuclease (ChIP-exo), by assigning weights to sequence positions, thereby enhancing motif discovery. TESA employs a nuanced approach combining a binomial distribution model with a graph model, further supported by a "bookend" model, to improve the accuracy of predicting motifs of varying lengths. Our evaluation, utilizing an extensive compilation of 90 prokaryotic ChIP-exo datasets from proChIPdb and 167 H. sapiens datasets, compared TESA's performance against seven established tools. The results indicate TESA's improved precision in motif identification, suggesting its valuable contribution to the field of genomic research.

4.
iScience ; 27(3): 109294, 2024 Mar 15.
Article in English | MEDLINE | ID: mdl-38450156

ABSTRACT

The noninvasive detection of pancreatic ductal adenocarcinoma (PDAC) remains an immense challenge. In this study, we proposed a robust, accurate, and noninvasive classifier, namely Multi-Omics Co-training Graph Convolutional Networks (MOCO-GCN). It achieved high accuracy (0.9 ± 0.06), F1 score (0.9± 0.07), and AUROC (0.89± 0.08), surpassing contemporary approaches. The performance of model was validated on an external cohort of German PDAC patients. Additionally, we discovered that the exposome may impact PDAC development through its complex interplay with gut microbiome by mediation analysis. For example, Fusobacterium hwasookii nucleatum, known for its ability to induce inflammatory responses, may serve as a mediator for the impact of rheumatoid arthritis on PDAC. Overall, our study sheds light on how exposome and microbiome in concert could contribute to PDAC development, and enable PDAC diagnosis with high fidelity and interpretability.

5.
Res Sq ; 2024 Feb 15.
Article in English | MEDLINE | ID: mdl-38410424

ABSTRACT

Spatial omics technologies are capable of deciphering detailed components of complex organs or tissue in cellular and subcellular resolution. A robust, interpretable, and unbiased representation method for spatial omics is necessary to illuminate novel investigations into biological functions, whereas a mathematical theory deficiency still exists. We present SpaGFT (Spatial Graph Fourier Transform), which provides a unique analytical feature representation of spatial omics data and elucidates molecular signatures linked to critical biological processes within tissues and cells. It outperformed existing tools in spatially variable gene prediction and gene expression imputation across human/mouse Visium data. Integrating SpaGFT representation into existing machine learning frameworks can enhance up to 40% accuracy of spatial domain identification, cell type annotation, cell-to-spot alignment, and subcellular hallmark inference. SpaGFT identified immunological regions for B cell maturation in human lymph node Visium data, characterized secondary follicle variations from in-house human tonsil CODEX data, and detected extremely rare subcellular organelles such as Cajal body and Set1/COMPASS. This new method lays the groundwork for a new theoretical model in explainable AI, advancing our understanding of tissue organization and function.

6.
Nat Commun ; 15(1): 338, 2024 Jan 06.
Article in English | MEDLINE | ID: mdl-38184630

ABSTRACT

Rare cell populations are key in neoplastic progression and therapeutic response, offering potential intervention targets. However, their computational identification and analysis often lag behind major cell types. To fill this gap, we introduce MarsGT: Multi-omics Analysis for Rare population inference using a Single-cell Graph Transformer. It identifies rare cell populations using a probability-based heterogeneous graph transformer on single-cell multi-omics data. MarsGT outperforms existing tools in identifying rare cells across 550 simulated and four real human datasets. In mouse retina data, it reveals unique subpopulations of rare bipolar cells and a Müller glia cell subpopulation. In human lymph node data, MarsGT detects an intermediate B cell population potentially acting as lymphoma precursors. In human melanoma data, it identifies a rare MAIT-like population impacted by a high IFN-I response and reveals the mechanism of immunotherapy. Hence, MarsGT offers biological insights and suggests potential strategies for early detection and therapeutic intervention of disease.


Subject(s)
B-Lymphocytes , Multiomics , Humans , Animals , Mice , Electric Power Supplies , Ependymoglial Cells , Immunotherapy
7.
Comput Biol Med ; 165: 107458, 2023 10.
Article in English | MEDLINE | ID: mdl-37703713

ABSTRACT

The identification of microbial characteristics associated with diseases is crucial for disease diagnosis and therapy. However, the presence of heterogeneity, high dimensionality, and large amounts of microbial data presents tremendous challenges in discovering key microbial features. In this paper, we present IDAM, a novel computational method for inferring disease-associated gene modules from metagenomic and metatranscriptomic data. This method integrates gene context conservation (uber-operons) and regulatory mechanisms (gene co-expression patterns) within a mathematical graph model to explore gene modules associated with specific diseases. It alleviates reliance on prior meta-data. We applied IDAM to publicly available datasets from inflammatory bowel disease, melanoma, type 1 diabetes mellitus, and irritable bowel syndrome. The results demonstrated the superior performance of IDAM in inferring disease-associated characteristics compared to existing popular tools. Furthermore, we showcased the high reproducibility of the gene modules inferred by IDAM using independent cohorts with inflammatory bowel disease. We believe that IDAM can be a highly advantageous method for exploring disease-associated microbial characteristics. The source code of IDAM is freely available at https://github.com/OSU-BMBL/IDAM, and the web server can be accessed at https://bmblx.bmi.osumc.edu/idam/.


Subject(s)
Diabetes Mellitus, Type 1 , Inflammatory Bowel Diseases , Humans , Gene Regulatory Networks , Reproducibility of Results , Diabetes Mellitus, Type 1/genetics , Inflammatory Bowel Diseases/genetics , Genes, Microbial
8.
bioRxiv ; 2023 Aug 17.
Article in English | MEDLINE | ID: mdl-37645917

ABSTRACT

Rare cell populations are key in neoplastic progression and therapeutic response, offering potential intervention targets. However, their computational identification and analysis often lag behind major cell types. To fill this gap, we introduced MarsGT: Multi-omics Analysis for Rare population inference using Single-cell Graph Transformer. It identifies rare cell populations using a probability-based heterogeneous graph transformer on single-cell multi-omics data. MarsGT outperformed existing tools in identifying rare cells across 400 simulated and four real human datasets. In mouse retina data, it revealed unique subpopulations of rare bipolar cells and a Müller glia cell subpopulation. In human lymph node data, MarsGT detected an intermediate B cell population potentially acting as lymphoma precursors. In human melanoma data, it identified a rare MAIT-like population impacted by a high IFN-I response and revealed the mechanism of immunotherapy. Hence, MarsGT offers biological insights and suggests potential strategies for early detection and therapeutic intervention of disease.

9.
Front Plant Sci ; 14: 1190503, 2023.
Article in English | MEDLINE | ID: mdl-37384360

ABSTRACT

Seed coat color is a typical morphological trait that can be used to reveal the evolution of soybean. The study of seed coat color-related traits in soybeans is of great significance for both evolutionary theory and breeding practices. In this study, 180 F10 recombinant inbred lines (RILs) derived from the cross between the yellow-seed coat cultivar Jidou12 (ZDD23040, JD12) and the wild black-seed coat accession Y9 (ZYD02739) were used as materials. Three methods, single-marker analysis (SMA), interval mapping (IM), and inclusive composite interval mapping (ICIM), were used to identify quantitative trait loci (QTLs) controlling seed coat color and seed hilum color. Simultaneously, two genome-wide association study (GWAS) models, the generalized linear model (GLM) and mixed linear model (MLM), were used to jointly identify seed coat color and seed hilum color QTLs in 250 natural populations. By integrating the results from QTL mapping and GWAS analysis, we identified two stable QTLs (qSCC02 and qSCC08) associated with seed coat color and one stable QTL (qSHC08) related to seed hilum color. By combining the results of linkage analysis and association analysis, two stable QTLs (qSCC02, qSCC08) for seed coat color and one stable QTL (qSHC08) for seed hilum color were identified. Upon further investigation using Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, we validated the previous findings that two candidate genes (CHS3C and CHS4A) reside within the qSCC08 region and identified a new QTL, qSCC02. There were a total of 28 candidate genes in the interval, among which Glyma.02G024600, Glyma.02G024700, and Glyma.02G024800 were mapped to the glutathione metabolic pathway, which is related to the transport or accumulation of anthocyanin. We considered the three genes as potential candidate genes for soybean seed coat-related traits. The QTLs and candidate genes detected in this study provide a foundation for further understanding the genetic mechanisms underlying soybean seed coat color and seed hilum color and are of significant value in marker-assisted breeding.

10.
Plant Sci ; 331: 111677, 2023 Jun.
Article in English | MEDLINE | ID: mdl-36931563

ABSTRACT

Vacuolar Protein Sorting 8 (Vps8) protein is a specific subunit of the class C core vacuole/endosome tethering (CORVET) complex that plays a key role in endosomal trafficking in yeast (Saccharomyces cerevisiae). However, its functions remain largely unclear in plant vegetative growth. Here, we identified a soybean (Glycine max) T4219 mutant characterized with compact plant architecture. Map-based cloning targeted to a candidate gene GmVPS8a (Glyma.07g049700) and further found that two nucleotides deletion in the first exon of GmVPS8a causes a premature termination of the encoded protein in the T4219 mutant. Its functions were validated by CRISPR/Cas9-engineered mutation in the GmVPS8a gene that recapitulated the T4219 mutant phenotypes. Furthermore, NbVPS8a-silenced tobacco (Nicotiana benthamiana) plants exhibited similar phenotypes to the T4219 mutant, suggesting its conserved roles in plant growth. The GmVPS8a is widely expressed in multiple organs and its protein interacts with GmAra6a and GmRab5a. Combined analysis of transcriptomic and proteomic data revealed that dysfunction of GmVPS8a mainly affects pathways on auxin signal transduction, sugar transport and metabolism, and lipid metabolism. Collectively, our work reveals the function of GmVPS8a in plant architecture, which may extend a new way for genetic improvement of ideal plant-architecture breeding in soybean and other crops.


Subject(s)
Glycine max , Proteomics , Glycine max/genetics , Plant Breeding , Endosomes/metabolism , Vacuoles/metabolism , Saccharomyces cerevisiae/metabolism
11.
Nat Commun ; 14(1): 964, 2023 02 21.
Article in English | MEDLINE | ID: mdl-36810839

ABSTRACT

Single-cell multi-omics (scMulti-omics) allows the quantification of multiple modalities simultaneously to capture the intricacy of complex molecular mechanisms and cellular heterogeneity. Existing tools cannot effectively infer the active biological networks in diverse cell types and the response of these networks to external stimuli. Here we present DeepMAPS for biological network inference from scMulti-omics. It models scMulti-omics in a heterogeneous graph and learns relations among cells and genes within both local and global contexts in a robust manner using a multi-head graph transformer. Benchmarking results indicate DeepMAPS performs better than existing tools in cell clustering and biological network construction. It also showcases competitive capability in deriving cell-type-specific biological networks in lung tumor leukocyte CITE-seq data and matched diffuse small lymphocytic lymphoma scRNA-seq and scATAC-seq data. In addition, we deploy a DeepMAPS webserver equipped with multiple functionalities and visualizations to improve the usability and reproducibility of scMulti-omics data analysis.


Subject(s)
Benchmarking , Data Analysis , Reproducibility of Results , Cluster Analysis , Electric Power Supplies , Single-Cell Analysis
12.
Trends Microbiol ; 31(7): 707-722, 2023 07.
Article in English | MEDLINE | ID: mdl-36841736

ABSTRACT

The human microbiome is intimately related to cancer biology and plays a vital role in the efficacy of cancer treatments, including immunotherapy. Extraordinary evidence has revealed that several microbes influence tumor development through interaction with the host immune system, that is, immuno-oncology-microbiome (IOM). This review focuses on the intratumoral microbiome in IOM and describes the available data and computational methods for discovering biological insights of microbial profiling from host bulk, single-cell, and spatial sequencing data. Critical challenges in data analysis and integration are discussed. Specifically, the microorganisms associated with cancer and cancer treatment in the context of IOM are collected and integrated from the literature. Lastly, we provide our perspectives for future directions in IOM research.


Subject(s)
Microbiota , Neoplasms , Humans , Neoplasms/therapy , Immunotherapy/methods , Computational Biology/methods , Forecasting
13.
Brief Bioinform ; 25(1)2023 11 22.
Article in English | MEDLINE | ID: mdl-38189539

ABSTRACT

Sequence motif discovery algorithms enhance the identification of novel deoxyribonucleic acid sequences with pivotal biological significance, especially transcription factor (TF)-binding motifs. The advent of assay for transposase-accessible chromatin using sequencing (ATAC-seq) has broadened the toolkit for motif characterization. Nonetheless, prevailing computational approaches have focused on delineating TF-binding footprints, with motif discovery receiving less attention. Herein, we present Cis rEgulatory Motif Influence using de Bruijn Graph (CEMIG), an algorithm leveraging de Bruijn and Hamming distance graph paradigms to predict and map motif sites. Assessment on 129 ATAC-seq datasets from the Cistrome Data Browser demonstrates CEMIG's exceptional performance, surpassing three established methodologies on four evaluative metrics. CEMIG accurately identifies both cell-type-specific and common TF motifs within GM12878 and K562 cell lines, demonstrating its comparative genomic capabilities in the identification of evolutionary conservation and cell-type specificity. In-depth transcriptional and functional genomic studies have validated the functional relevance of CEMIG-identified motifs across various cell types. CEMIG is available at https://github.com/OSU-BMBL/CEMIG, developed in C++ to ensure cross-platform compatibility with Linux, macOS and Windows operating systems.


Subject(s)
Algorithms , Chromatin Immunoprecipitation Sequencing , Benchmarking , Biological Evolution , Cell Line
14.
Sensors (Basel) ; 22(23)2022 Nov 25.
Article in English | MEDLINE | ID: mdl-36501862

ABSTRACT

Achieving low-cost and high-performance network security communication is necessary for Internet of Things (IoT) devices, including intelligent sensors and mobile robots. Designing hardware accelerators to accelerate multiple computationally intensive cryptographic primitives in various network security protocols is challenging. Different from existing unified reconfigurable cryptographic accelerators with relatively low efficiency and high latency, this paper presents design and analysis of a reconfigurable cryptographic accelerator consisting of a reconfigurable cipher unit and a reconfigurable hash unit to support widely used cryptographic algorithms for IoT Devices, which require block ciphers and hash functions simultaneously. Based on a detailed and comprehensive algorithmic analysis of both the block ciphers and hash functions in terms of basic algorithm structures and common cryptographic operators, the proposed reconfigurable cryptographic accelerator is designed by reusing key register files and operators to build unified data paths. Both the reconfigurable cipher unit and the reconfigurable hash unit contain a unified data path to implement Data Encryption Standard (DES)/Advanced Encryption Standard (AES)/ShangMi 4 (SM4) and Secure Hash Algorithm-1 (SHA-1)/SHA-256/SM3 algorithms, respectively. A reconfigurable S-Box for AES and SM4 is designed based on the composite field Galois field (GF) GF(((22)2)2), which significantly reduces hardware overhead and power consumption compared with the conventional implementation by look-up tables. The experimental results based on 65-nm application-specific integrated circuit (ASIC) implementation show that the achieved energy efficiency and area efficiency of the proposed design is 441 Gbps/W and 37.55 Gbps/mm2, respectively, which is suitable for IoT devices with limited battery and form factor. The result of delay analysis also shows that the number of delay cycles of our design can be reduced by 83% compared with the state-of-the-art design, which shows that the proposed design is more suitable for applications including 5G/Wi-Fi/ZigBee/Ethernet network standards to accelerate block ciphers and hash functions simultaneously.

15.
Sensors (Basel) ; 22(22)2022 Nov 18.
Article in English | MEDLINE | ID: mdl-36433543

ABSTRACT

Simultaneous localization and mapping (SLAM) is the major solution for constructing or updating a map of an unknown environment while simultaneously keeping track of a mobile robot's location. Correlative Scan Matching (CSM) is a scan matching algorithm for obtaining the posterior distribution probability for the robot's pose in SLAM. This paper combines the non-linear optimization algorithm and CSM algorithm into an NLO-CSM (Non-linear Optimization CSM) algorithm for reducing the computation resources and the amount of computation while ensuring high calculation accuracy, and it presents an efficient hardware accelerator design of the NLO-CSM algorithm for the scan matching in 2D LiDAR SLAM. The proposed NLO-CSM hardware accelerator utilizes pipeline processing and module reusing techniques to achieve low hardware overhead, fast matching, and high energy efficiency. FPGA implementation results show that, at 100 MHz clock, the power consumption of the proposed hardware accelerator is as low as 0.79 W, while it performs a scan match at 8.98 ms and 7.15 mJ per frame. The proposed design outperforms the ARM-A9 dual-core CPU implementation with a 92.74% increase and 90.71% saving in computing speed and energy consumption, respectively. It has also achieved 80.3% LUTs, 84.13% FFs, and 20.83% DSPs saving, as well as an 8.17× increase in frame rate and 96.22% improvement in energy efficiency over a state-of-the-art hardware accelerator design in the literature. ASIC implementation in 65 nm can further reduce the computing time and energy consumption per scan to 5.94 ms and 0.06 mJ, respectively, which shows that the proposed NLO-CSM hardware accelerator design is suitable for resource-limited and energy-constrained mobile and micro robot applications.

16.
Entropy (Basel) ; 24(11)2022 Oct 30.
Article in English | MEDLINE | ID: mdl-36359655

ABSTRACT

Entropy is a measure of uncertainty or randomness. It is the foundation for almost all cryptographic systems. True random number generators (TRNGs) and physical unclonable functions (PUFs) are the silicon primitives to respectively harvest dynamic and static entropy to generate random bit streams. In this survey paper, we present a systematic and comprehensive review of different state-of-the-art methods to harvest entropy from silicon-based devices, including the implementations, applications, and the security of the designs. Furthermore, we conclude the trends of the entropy source design to point out the current spots of entropy harvesting.

17.
Nat Commun ; 13(1): 6494, 2022 10 30.
Article in English | MEDLINE | ID: mdl-36310235

ABSTRACT

Drug screening data from massive bulk gene expression databases can be analyzed to determine the optimal clinical application of cancer drugs. The growing amount of single-cell RNA sequencing (scRNA-seq) data also provides insights into improving therapeutic effectiveness by helping to study the heterogeneity of drug responses for cancer cell subpopulations. Developing computational approaches to predict and interpret cancer drug response in single-cell data collected from clinical samples can be very useful. We propose scDEAL, a deep transfer learning framework for cancer drug response prediction at the single-cell level by integrating large-scale bulk cell-line data. The highlight in scDEAL involves harmonizing drug-related bulk RNA-seq data with scRNA-seq data and transferring the model trained on bulk RNA-seq data to predict drug responses in scRNA-seq. Another feature of scDEAL is the integrated gradient feature interpretation to infer the signature genes of drug resistance mechanisms. We benchmark scDEAL on six scRNA-seq datasets and demonstrate its model interpretability via three case studies focusing on drug response label prediction, gene signature identification, and pseudotime analysis. We believe that scDEAL could help study cell reprogramming, drug selection, and repurposing for improving therapeutic efficacy.


Subject(s)
Antineoplastic Agents , Neoplasms , Sequence Analysis, RNA , Single-Cell Analysis , Gene Expression Profiling , RNA-Seq , Machine Learning , Antineoplastic Agents/pharmacology , Neoplasms/drug therapy , Neoplasms/genetics
18.
Front Plant Sci ; 13: 961619, 2022.
Article in English | MEDLINE | ID: mdl-36051289

ABSTRACT

Heterophylly, the existence of different leaf shapes and sizes on the same plant, has been observed in many flowering plant species. Yet, the genetic characteristics and genetic basis of heterophylly in soybean remain unknown. Here, two populations of recombinant inbred lines (RILs) with distinctly different leaf shapes were used to identify loci controlling heterophylly in two environments. The ratio of apical leaf shape (LSUP) to basal leaf shape (LSDOWN) at the reproductive growth stage (RLS) was used as a parameter for classifying heterophylly. A total of eight QTL were detected for RLS between the two populations and four of them were stably identified in both environments. Among them, qRLS20 had the largest effect in the JS population, with a maximum LOD value of 46.9 explaining up to 47.2% of phenotypic variance. This locus was located in the same genomic region as the basal leaf shape QTL qLSDOWN20 on chromosome 20. The locus qRLS19 had the largest effect in the JJ population, with a maximum LOD value of 15.2 explaining up to 27.0% of phenotypic variance. This locus was located in the same genomic region as the apical leaf shape QTL qLSUP19 on chromosome 19. Four candidate genes for heterophylly were identified based on sequence differences among the three parents of the two mapping populations, RT-qPCR analysis, and gene functional annotation analysis. The QTL and candidate genes detected in this study lay a foundation for further understanding the genetic mechanism of heterophylly and are invaluable in marker-assisted breeding.

19.
Comput Struct Biotechnol J ; 20: 4600-4617, 2022.
Article in English | MEDLINE | ID: mdl-36090815

ABSTRACT

Spatially resolved transcriptomics provides a new way to define spatial contexts and understand the pathogenesis of complex human diseases. Although some computational frameworks can characterize spatial context via various clustering methods, the detailed spatial architectures and functional zonation often cannot be revealed and localized due to the limited capacities of associating spatial information. We present RESEPT, a deep-learning framework for characterizing and visualizing tissue architecture from spatially resolved transcriptomics. Given inputs such as gene expression or RNA velocity, RESEPT learns a three-dimensional embedding with a spatial retained graph neural network from spatial transcriptomics. The embedding is then visualized by mapping into color channels in an RGB image and segmented with a supervised convolutional neural network model. Based on a benchmark of 10x Genomics Visium spatial transcriptomics datasets on the human and mouse cortex, RESEPT infers and visualizes the tissue architecture accurately. It is noteworthy that, for the in-house AD samples, RESEPT can localize cortex layers and cell types based on pre-defined region- or cell-type-enriched genes and furthermore provide critical insights into the identification of amyloid-beta plaques in Alzheimer's disease. Interestingly, in a glioblastoma sample analysis, RESEPT distinguishes tumor-enriched, non-tumor, and regions of neuropil with infiltrating tumor cells in support of clinical and prognostic cancer applications.

20.
Comput Struct Biotechnol J ; 20: 3053-3058, 2022.
Article in English | MEDLINE | ID: mdl-35782725

ABSTRACT

Cis-regulatory motif (motif for short) identification and analyses are essential steps in detecting gene regulatory mechanisms. Deep learning (DL) models have shown substantial advances in motif prediction. In parallel, intuitive and integrative web databases are needed to make effective use of DL models and ensure easy access to the identified motifs. Here, we present DESSO-DB, a web database developed to allow efficient access to the identified motifs and diverse motif analyses. DESSO-DB provides motif prediction results and visualizations of 690 ENCODE human Chromatin Immunoprecipitation sequencing (ChIP-seq) data (including 161 transcription factors (TFs) in 91 cell lines) and 1,677 human ChIP-seq data (including 547 TFs in 359 cell lines) from Cistrome DB using DESSO, which is an in-house developed DL tool for motif prediction. It also provides online motif finding and scanning functions for new ChIP-seq/ATAC-seq datasets and downloadable motif results of the above 690 DECODE datasets, 126 cancer ChIP-seq, 55 RNA Crosslinking-Immunoprecipitation and high-throughput sequencing (CLIP-seq) data. DESSO-DB is deployed on the Google Cloud Platform, providing stabilized and efficient resources freely to the public. DESSO-DB is free and available at http://cloud.osubmi.com/DESSO/.

SELECTION OF CITATIONS
SEARCH DETAIL
...