Search | VHL Regional Portal

1.

STAB2: an updated spatio-temporal cell atlas of the human and mouse brain.

Yang, Yucheng T; Gan, Ziquan; Zhang, Jinglong; Zhao, Xingzhong; Yang, Yifan; Han, Shuwen; Wu, Wei; Zhao, Xing-Ming.

Nucleic Acids Res ; 52(D1): D1033-D1041, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37904591

ABSTRACT

The brain is constituted of heterogeneous types of neuronal and non-neuronal cells, which are organized into distinct anatomical regions, and show precise regulation of gene expression during development, aging and function. In the current database release, STAB2 provides a systematic cellular map of the human and mouse brain by integrating recently published large-scale single-cell and single-nucleus RNA-sequencing datasets from diverse regions and across lifespan. We applied a hierarchical strategy of unsupervised clustering on the integrated single-cell transcriptomic datasets to precisely annotate the cell types and subtypes in the human and mouse brain. Currently, STAB2 includes 71 and 61 different cell subtypes defined in the human and mouse brain, respectively. It covers 63 subregions and 15 developmental stages of human brain, and 38 subregions and 30 developmental stages of mouse brain, generating a comprehensive atlas for exploring spatiotemporal transcriptomic dynamics in the mammalian brain. We also augmented web interfaces for querying and visualizing the gene expression in specific cell types. STAB2 is freely available at https://mai.fudan.edu.cn/stab2.

Subject(s)

Brain , Databases, Genetic , Neurons , Single-Cell Gene Expression Analysis , Animals , Humans , Mice , Atlases as Topic , Brain/cytology , Brain/growth & development , Brain/metabolism , Neurons/metabolism , Transcriptome , Datasets as Topic

2.

An augmented Mendelian randomization approach provides causality of brain imaging features on complex traits in a single biobank-scale dataset.

Yang, Anyi; Yang, Yucheng T; Zhao, Xing-Ming.

PLoS Genet ; 19(12): e1011112, 2023 Dec.

Article in English | MEDLINE | ID: mdl-38150468

ABSTRACT

Mendelian randomization (MR) is an effective approach for revealing causal risk factors that underpin complex traits and diseases. While MR has been more widely applied under two-sample settings, it is more promising to be used in one single large cohort given the rise of biobank-scale datasets that simultaneously contain genotype data, brain imaging data, and matched complex traits from the same individual. However, most existing multivariable MR methods have been developed for two-sample setting or a small number of exposures. In this study, we introduce a one-sample multivariable MR method based on partial least squares and Lasso regression (MR-PL). MR-PL is capable of considering the correlation among exposures (e.g., brain imaging features) when the number of exposures is extremely upscaled, while also correcting for winner's curse bias. We performed extensive and systematic simulations, and demonstrated the robustness and reliability of our method. Comprehensive simulations confirmed that MR-PL can generate more precise causal estimates with lower false positive rates than alternative approaches. Finally, we applied MR-PL to the datasets from UK Biobank to reveal the causal effects of 36 white matter tracts on 180 complex traits, and showed putative white matter tracts that are implicated in smoking, blood vascular function-related traits, and eating behaviors.

Subject(s)

Biological Specimen Banks , Mendelian Randomization Analysis , Humans , Mendelian Randomization Analysis/methods , Multifactorial Inheritance , Reproducibility of Results , Neuroimaging , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide

3.

Constructing a full, multiple-layer interactome for SARS-CoV-2 in the context of lung disease: Linking the virus with human genes and microbes.

Lou, Shaoke; Yang, Mingjun; Li, Tianxiao; Zhao, Weihao; Cevasco, Hannah; Yang, Yucheng T; Gerstein, Mark.

PLoS Comput Biol ; 19(7): e1011222, 2023 Jul.

Article in English | MEDLINE | ID: mdl-37410793

ABSTRACT

The COVID-19 pandemic caused by the SARS-CoV-2 virus has resulted in millions of deaths worldwide. The disease presents with various manifestations that can vary in severity and long-term outcomes. Previous efforts have contributed to the development of effective strategies for treatment and prevention by uncovering the mechanism of viral infection. We now know all the direct protein-protein interactions that occur during the lifecycle of SARS-CoV-2 infection, but it is critical to move beyond these known interactions to a comprehensive understanding of the "full interactome" of SARS-CoV-2 infection, which incorporates human microRNAs (miRNAs), additional human protein-coding genes, and exogenous microbes. Potentially, this will help in developing new drugs to treat COVID-19, differentiating the nuances of long COVID, and identifying histopathological signatures in SARS-CoV-2-infected organs. To construct the full interactome, we developed a statistical modeling approach called MLCrosstalk (multiple-layer crosstalk) based on latent Dirichlet allocation. MLCrosstalk integrates data from multiple sources, including microbes, human protein-coding genes, miRNAs, and human protein-protein interactions. It constructs "topics" that group SARS-CoV-2 with genes and microbes based on similar patterns of co-occurrence across patient samples. We use these topics to infer linkages between SARS-CoV-2 and protein-coding genes, miRNAs, and microbes. We then refine these initial linkages using network propagation to contextualize them within a larger framework of network and pathway structures. Using MLCrosstalk, we identified genes in the IL1-processing and VEGFA-VEGFR2 pathways that are linked to SARS-CoV-2. We also found that Rothia mucilaginosa and Prevotella melaninogenica are positively and negatively correlated with SARS-CoV-2 abundance, a finding corroborated by analysis of single-cell sequencing data.

Subject(s)

COVID-19 , MicroRNAs , Humans , SARS-CoV-2/genetics , Post-Acute COVID-19 Syndrome , Pandemics/prevention & control , MicroRNAs/genetics

4.

Prioritizing genes associated with brain disorders by leveraging enhancer-promoter interactions in diverse neural cells and tissues.

Zhao, Xingzhong; Song, Liting; Yang, Anyi; Zhang, Zichao; Zhang, Jinglong; Yang, Yucheng T; Zhao, Xing-Ming.

Genome Med ; 15(1): 56, 2023 07 24.

Article in English | MEDLINE | ID: mdl-37488639

ABSTRACT

BACKGROUND: Prioritizing genes that underlie complex brain disorders poses a considerable challenge. Despite previous studies have found that they shared symptoms and heterogeneity, it remained difficult to systematically identify the risk genes associated with them. METHODS: By using the CAGE (Cap Analysis of Gene Expression) read alignment files for 439 human cell and tissue types (including primary cells, tissues and cell lines) from FANTOM5 project, we predicted enhancer-promoter interactions (EPIs) of 439 cell and tissue types in human, and examined their reliability. Then we evaluated the genetic heritability of 17 diverse brain disorders and behavioral-cognitive phenotypes in each neural cell type, brain region, and developmental stage. Furthermore, we prioritized genes associated with brain disorders and phenotypes by leveraging the EPIs in each neural cell and tissue type, and analyzed their pleiotropy and functionality for different categories of disorders and phenotypes. Finally, we characterized the spatiotemporal expression dynamics of these associated genes in cells and tissues. RESULTS: We found that identified EPIs showed activity specificity and network aggregation in cell and tissue types, and enriched TF binding in neural cells played key roles in synaptic plasticity and nerve cell development, i.e., EGR1 and SOX family. We also discovered that most neurological disorders exhibit heritability enrichment in neural stem cells and astrocytes, while psychiatric disorders and behavioral-cognitive phenotypes exhibit enrichment in neurons. Furthermore, our identified genes recapitulated well-known risk genes, which exhibited widespread pleiotropy between psychiatric disorders and behavioral-cognitive phenotypes (i.e., FOXP2), and indicated expression specificity in neural cell types, brain regions, and developmental stages associated with disorders and phenotypes. Importantly, we showed the potential associations of brain disorders with brain regions and developmental stages that have not been well studied. CONCLUSIONS: Overall, our study characterized the gene-enhancer regulatory networks and genetic mechanisms in the human neural cells and tissues, and illustrated the value of reanalysis of publicly available genomic datasets.

Subject(s)

Brain Diseases , Humans , Reproducibility of Results , Promoter Regions, Genetic , Neurons , Gene Regulatory Networks

5.

The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models.

Rozowsky, Joel; Gao, Jiahao; Borsari, Beatrice; Yang, Yucheng T; Galeev, Timur; Gürsoy, Gamze; Epstein, Charles B; Xiong, Kun; Xu, Jinrui; Li, Tianxiao; Liu, Jason; Yu, Keyang; Berthel, Ana; Chen, Zhanlin; Navarro, Fabio; Sun, Maxwell S; Wright, James; Chang, Justin; Cameron, Christopher J F; Shoresh, Noam; Gaskell, Elizabeth; Drenkow, Jorg; Adrian, Jessika; Aganezov, Sergey; Aguet, François; Balderrama-Gutierrez, Gabriela; Banskota, Samridhi; Corona, Guillermo Barreto; Chee, Sora; Chhetri, Surya B; Cortez Martins, Gabriel Conte; Danyko, Cassidy; Davis, Carrie A; Farid, Daniel; Farrell, Nina P; Gabdank, Idan; Gofin, Yoel; Gorkin, David U; Gu, Mengting; Hecht, Vivian; Hitz, Benjamin C; Issner, Robbyn; Jiang, Yunzhe; Kirsche, Melanie; Kong, Xiangmeng; Lam, Bonita R; Li, Shantao; Li, Bian; Li, Xiqi; Lin, Khine Zin.

Cell ; 186(7): 1493-1511.e40, 2023 03 30.

Article in English | MEDLINE | ID: mdl-37001506

ABSTRACT

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (â¼30 tissues × â¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.

Subject(s)

Epigenome , Quantitative Trait Loci , Genome-Wide Association Study , Genomics , Phenotype , Polymorphism, Single Nucleotide

6.

Deciphering the genetic architecture of human brain structure and function: a brief survey on recent advances of neuroimaging genomics.

Zhao, Xingzhong; Yang, Anyi; Zhang, Zi-Chao; Yang, Yucheng T; Zhao, Xing-Ming.

Brief Bioinform ; 24(2)2023 03 19.

Article in English | MEDLINE | ID: mdl-36847697

ABSTRACT

Brain imaging genomics is an emerging interdisciplinary field, where integrated analysis of multimodal medical image-derived phenotypes (IDPs) and multi-omics data, bridging the gap between macroscopic brain phenotypes and their cellular and molecular characteristics. This approach aims to better interpret the genetic architecture and molecular mechanisms associated with brain structure, function and clinical outcomes. More recently, the availability of large-scale imaging and multi-omics datasets from the human brain has afforded the opportunity to the discovering of common genetic variants contributing to the structural and functional IDPs of the human brain. By integrative analyses with functional multi-omics data from the human brain, a set of critical genes, functional genomic regions and neuronal cell types have been identified as significantly associated with brain IDPs. Here, we review the recent advances in the methods and applications of multi-omics integration in brain imaging analysis. We highlight the importance of functional genomic datasets in understanding the biological functions of the identified genes and cell types that are associated with brain IDPs. Moreover, we summarize well-known neuroimaging genetics datasets and discuss challenges and future directions in this field.

Subject(s)

Brain , Genomics , Humans , Genomics/methods , Brain/diagnostic imaging , Brain/metabolism , Phenotype , Neuroimaging/methods

7.

Cellular transcriptional alterations of peripheral blood in Alzheimer's disease.

Song, Liting; Yang, Yucheng T; Guo, Qihao; Zhao, Xing-Ming.

BMC Med ; 20(1): 266, 2022 08 29.

Article in English | MEDLINE | ID: mdl-36031604

ABSTRACT

BACKGROUND: Alzheimer's disease (AD), a progressive neurodegenerative disease, is the most common cause of dementia worldwide. Accumulating data support the contributions of the peripheral immune system in AD pathogenesis. However, there is a lack of comprehensive understanding about the molecular characteristics of peripheral immune cells in AD. METHODS: To explore the alterations of cellular composition and the alterations of intrinsic expression of individual cell types in peripheral blood, we performed cellular deconvolution in a large-scale bulk blood expression cohort and identified cell-intrinsic differentially expressed genes in individual cell types with adjusting for cellular proportion. RESULTS: We detected a significant increase and decrease in the proportion of neutrophils and B lymphocytes in AD blood, respectively, which had a robust replicability across other three AD cohorts, as well as using alternative algorithms. The differentially expressed genes in AD neutrophils were enriched for some AD-associated pathways, such as ATP metabolic process and mitochondrion organization. We also found a significant enrichment of protein-protein interaction network modules of leukocyte cell-cell activation, mitochondrion organization, and cytokine-mediated signaling pathway in neutrophils for AD risk genes including CD33 and IL1B. Both changes in cellular composition and expression levels of specific genes were significantly associated with the clinical and pathological alterations. A similar pattern of perturbations on the cellular proportion and gene expression levels of neutrophils could be also observed in mild cognitive impairment (MCI). Moreover, we noticed an elevation of neutrophil abundance in the AD brains. CONCLUSIONS: We revealed the landscape of molecular perturbations at the cellular level for AD. These alterations highlight the putative roles of neutrophils in AD pathobiology.

Subject(s)

Alzheimer Disease , Cognitive Dysfunction , Neurodegenerative Diseases , Brain , Cohort Studies , Humans

8.

Standardized annotation of translated open reading frames.

Mudge, Jonathan M; Ruiz-Orera, Jorge; Prensner, John R; Brunet, Marie A; Calvet, Ferriol; Jungreis, Irwin; Gonzalez, Jose Manuel; Magrane, Michele; Martinez, Thomas F; Schulz, Jana Felicitas; Yang, Yucheng T; Albà, M Mar; Aspden, Julie L; Baranov, Pavel V; Bazzini, Ariel A; Bruford, Elspeth; Martin, Maria Jesus; Calviello, Lorenzo; Carvunis, Anne-Ruxandra; Chen, Jin; Couso, Juan Pablo; Deutsch, Eric W; Flicek, Paul; Frankish, Adam; Gerstein, Mark; Hubner, Norbert; Ingolia, Nicholas T; Kellis, Manolis; Menschaert, Gerben; Moritz, Robert L; Ohler, Uwe; Roucou, Xavier; Saghatelian, Alan; Weissman, Jonathan S; van Heesch, Sebastiaan.

Nat Biotechnol ; 40(7): 994-999, 2022 07.

Article in English | MEDLINE | ID: mdl-35831657

Subject(s)

Protein Biosynthesis , Ribosomes , Molecular Sequence Annotation , Open Reading Frames , Ribosomes/metabolism

9.

POSTAR3: an updated platform for exploring post-transcriptional regulation coordinated by RNA-binding proteins.

Zhao, Weihao; Zhang, Shang; Zhu, Yumin; Xi, Xiaochen; Bao, Pengfei; Ma, Ziyuan; Kapral, Thomas H; Chen, Shuyuan; Zagrovic, Bojan; Yang, Yucheng T; Lu, Zhi John.

Nucleic Acids Res ; 50(D1): D287-D294, 2022 01 07.

Article in English | MEDLINE | ID: mdl-34403477

ABSTRACT

RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation. Accurate identification of RBP binding sites in multiple cell lines and tissue types from diverse species is a fundamental endeavor towards understanding the regulatory mechanisms of RBPs under both physiological and pathological conditions. Our POSTAR annotation processes make use of publicly available large-scale CLIP-seq datasets and external functional genomic annotations to generate a comprehensive map of RBP binding sites and their association with other regulatory events as well as functional variants. Here, we present POSTAR3, an updated database with improvements in data collection, annotation infrastructure, and analysis that support the annotation of post-transcriptional regulation in multiple species including: we made a comprehensive update on the CLIP-seq and Ribo-seq datasets which cover more biological conditions, technologies, and species; we added RNA secondary structure profiling for RBP binding sites; we provided miRNA-mediated degradation events validated by degradome-seq; we included RBP binding sites at circRNA junction regions; we expanded the annotation of RBP binding sites, particularly using updated genomic variants and mutations associated with diseases. POSTAR3 is freely available at http://postar.ncrnalab.org.

Subject(s)

Databases, Genetic , MicroRNAs/genetics , RNA Processing, Post-Transcriptional , RNA, Circular/genetics , RNA-Binding Proteins/genetics , Software , Animals , Arabidopsis/genetics , Arabidopsis/metabolism , Binding Sites , Cell Line , Datasets as Topic , Humans , Internet , MicroRNAs/classification , MicroRNAs/metabolism , Molecular Sequence Annotation , Nucleic Acid Conformation , RNA, Circular/classification , RNA, Circular/metabolism , RNA-Binding Proteins/classification , RNA-Binding Proteins/metabolism , Sequence Analysis, RNA

10.

Do Students With Different Majors Have Different Personality Traits? Evidence From Two Chinese Agricultural Universities.

Wen, Xicheng; Zhao, Yuhui; Yang, Yucheng T; Wang, Shiwei; Cao, Xinyu.

Front Psychol ; 12: 641333, 2021.

Article in English | MEDLINE | ID: mdl-33995194

ABSTRACT

This paper explores whether a Student's choice of major leads to certain personality traits and the reasons for this phenomenon. Specifically, we look at evidence from two Chinese universities, both of which specialize in agricultural studies. Using the Sixteen Personality Factor (16PF) questionnaire and the Neuroticism Extraversion Openness Five-Factor Inventory (NEO-FFI) questionnaire, we collected data from two groups of students: those who study agriculture-related majors (ARM), and those who study non-agriculture-related majors (NARM). The surveys all showed no significant change in personality traits during Students' freshman year. However, after 3 years of university study, significant personality trait changes were noted between seniors in the ARM and NARM groups. Whereas ARM seniors tended to be socially shy and lower in communicative competence, NARM seniors were better at expressing themselves and communicating with others. Although a Student's choice of profession has an influence on their personality traits, it is not the only factor. The differences between ARM and NARM training models and curricula are also undoubtedly significant. Moreover, the bias against ARM in Chinese society further magnifies the differences in personality traits among students with different majors.

11.

Predicting dynamic cellular protein-RNA interactions by deep learning using in vivo RNA structures.

Sun, Lei; Xu, Kui; Huang, Wenze; Yang, Yucheng T; Li, Pan; Tang, Lei; Xiong, Tuanlin; Zhang, Qiangfeng Cliff.

Cell Res ; 31(5): 495-516, 2021 05.

Article in English | MEDLINE | ID: mdl-33623109

ABSTRACT

Interactions with RNA-binding proteins (RBPs) are integral to RNA function and cellular regulation, and dynamically reflect specific cellular conditions. However, presently available tools for predicting RBP-RNA interactions employ RNA sequence and/or predicted RNA structures, and therefore do not capture their condition-dependent nature. Here, after profiling transcriptome-wide in vivo RNA secondary structures in seven cell types, we developed PrismNet, a deep learning tool that integrates experimental in vivo RNA structure data and RBP binding data for matched cells to accurately predict dynamic RBP binding in various cellular conditions. PrismNet results for 168 RBPs support its utility for both understanding CLIP-seq results and largely extending such interaction data to accurately analyze additional cell types. Further, PrismNet employs an "attention" strategy to computationally identify exact RBP-binding nucleotides, and we discovered enrichment among dynamic RBP-binding sites for structure-changing variants (riboSNitches), which can link genetic diseases with dysregulated RBP bindings. Our rich profiling data and deep learning-based prediction tool provide access to a previously inaccessible layer of cell-type-specific RBP-RNA interactions, with clear utility for understanding and treating human diseases.

Subject(s)

Deep Learning , RNA , Binding Sites , Humans , Protein Binding , RNA/metabolism , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Transcriptome

12.

GENCODE 2021.

Frankish, Adam; Diekhans, Mark; Jungreis, Irwin; Lagarde, Julien; Loveland, Jane E; Mudge, Jonathan M; Sisu, Cristina; Wright, James C; Armstrong, Joel; Barnes, If; Berry, Andrew; Bignell, Alexandra; Boix, Carles; Carbonell Sala, Silvia; Cunningham, Fiona; Di Domenico, Tomás; Donaldson, Sarah; Fiddes, Ian T; García Girón, Carlos; Gonzalez, Jose Manuel; Grego, Tiago; Hardy, Matthew; Hourlier, Thibaut; Howe, Kevin L; Hunt, Toby; Izuogu, Osagie G; Johnson, Rory; Martin, Fergal J; Martínez, Laura; Mohanan, Shamika; Muir, Paul; Navarro, Fabio C P; Parker, Anne; Pei, Baikang; Pozo, Fernando; Riera, Ferriol Calvet; Ruffier, Magali; Schmitt, Bianca M; Stapleton, Eloise; Suner, Marie-Marthe; Sycheva, Irina; Uszczynska-Ratajczak, Barbara; Wolf, Maxim Y; Xu, Jinuri; Yang, Yucheng T; Yates, Andrew; Zerbino, Daniel; Zhang, Yan; Choudhary, Jyoti S; Gerstein, Mark.

Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.

Article in English | MEDLINE | ID: mdl-33270111

ABSTRACT

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.

Subject(s)

COVID-19/prevention & control , Computational Biology/methods , Databases, Genetic , Genomics/methods , Molecular Sequence Annotation/methods , SARS-CoV-2/genetics , Animals , COVID-19/epidemiology , COVID-19/virology , Epidemics , Humans , Internet , Mice , Pseudogenes/genetics , RNA, Long Noncoding/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Transcription, Genetic/genetics

13.

Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks.

Li, Bian; Yang, Yucheng T; Capra, John A; Gerstein, Mark B.

PLoS Comput Biol ; 16(11): e1008291, 2020 11.

Article in English | MEDLINE | ID: mdl-33253214

ABSTRACT

Predicting mutation-induced changes in protein thermodynamic stability (ΔΔG) is of great interest in protein engineering, variant interpretation, and protein biophysics. We introduce ThermoNet, a deep, 3D-convolutional neural network (3D-CNN) designed for structure-based prediction of ΔΔGs upon point mutation. To leverage the image-processing power inherent in CNNs, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are uniformly constructed as multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. We train and evaluate ThermoNet with a curated data set that accounts for protein homology and is balanced with direct and reverse mutations; this provides a framework for addressing biases that have likely influenced many previous ΔΔG prediction methods. ThermoNet demonstrates performance comparable to the best available methods on the widely used Ssym test set. In addition, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We further show that homology between Ssym and widely used training sets like S2648 and VariBench has likely led to overestimated performance in previous studies. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔGs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar. Overall, our results suggest that 3D-CNNs can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.

Subject(s)

Imaging, Three-Dimensional/methods , Neural Networks, Computer , Point Mutation , Proteins/chemistry , Thermodynamics , Computational Biology , Protein Stability

14.

An integrative ENCODE resource for cancer genomics.

Zhang, Jing; Lee, Donghoon; Dhiman, Vineet; Jiang, Peng; Xu, Jie; McGillivray, Patrick; Yang, Hongbo; Liu, Jason; Meyerson, William; Clarke, Declan; Gu, Mengting; Li, Shantao; Lou, Shaoke; Xu, Jinrui; Lochovsky, Lucas; Ung, Matthew; Ma, Lijia; Yu, Shan; Cao, Qin; Harmanci, Arif; Yan, Koon-Kiu; Sethi, Anurag; Gürsoy, Gamze; Schoenberg, Michael Rutenberg; Rozowsky, Joel; Warrell, Jonathan; Emani, Prashant; Yang, Yucheng T; Galeev, Timur; Kong, Xiangmeng; Liu, Shuang; Li, Xiaotong; Krishnan, Jayanth; Feng, Yanlin; Rivera-Mulia, Juan Carlos; Adrian, Jessica; Broach, James R; Bolt, Michael; Moran, Jennifer; Fitzgerald, Dominic; Dileep, Vishnu; Liu, Tingting; Mei, Shenglin; Sasaki, Takayo; Trevilla-Garcia, Claudia; Wang, Su; Wang, Yanli; Zang, Chongzhi; Wang, Daifeng; Klein, Robert J.

Nat Commun ; 11(1): 3696, 2020 07 29.

Article in English | MEDLINE | ID: mdl-32728046

ABSTRACT

ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.

Subject(s)

Databases, Genetic , Genomics , Neoplasms/genetics , Cell Line, Tumor , Cell Transformation, Neoplastic/genetics , Gene Regulatory Networks , Humans , Mutation/genetics , Reproducibility of Results , Transcription Factors/metabolism

15.

Building a Hybrid Physical-Statistical Classifier for Predicting the Effect of Variants Related to Protein-Drug Interactions.

Wang, Bo; Yan, Chengfei; Lou, Shaoke; Emani, Prashant; Li, Bian; Xu, Min; Kong, Xiangmeng; Meyerson, William; Yang, Yucheng T; Lee, Donghoon; Gerstein, Mark.

Structure ; 27(9): 1469-1481.e3, 2019 09 03.

Article in English | MEDLINE | ID: mdl-31279629

ABSTRACT

A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on â¼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data.

Subject(s)

Computational Biology/methods , Polymorphism, Single Nucleotide , Proteins/chemistry , Proteins/genetics , Databases, Protein , Drug Design , Humans , Ligands , Machine Learning , Models, Statistical , Molecular Docking Simulation , Protein Binding , Protein Conformation , Proteins/metabolism

16.

POSTAR2: deciphering the post-transcriptional regulatory logics.

Zhu, Yumin; Xu, Gang; Yang, Yucheng T; Xu, Zhiyu; Chen, Xinduo; Shi, Binbin; Xie, Daoxin; Lu, Zhi John; Wang, Pengyuan.

Nucleic Acids Res ; 47(D1): D203-D211, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30239819

ABSTRACT

Post-transcriptional regulation of RNAs is critical to the diverse range of cellular processes. The volume of functional genomic data focusing on post-transcriptional regulation logics continues to grow in recent years. In the current database version, POSTAR2 (http://lulab.life.tsinghua.edu.cn/postar), we included the following new features and data: updated â¼500 CLIP-seq datasets (â¼1200 CLIP-seq datasets in total) from six species, including human, mouse, fly, worm, Arabidopsis and yeast; added a new module 'Translatome', which is derived from Ribo-seq datasets and contains â¼36 million open reading frames (ORFs) in the genomes from the six species; updated and unified post-transcriptional regulation and variation data. Finally, we improved web interfaces for searching and visualizing protein-RNA interactions with multi-layer information. Meanwhile, we also merged our CLIPdb database into POSTAR2. POSTAR2 will help researchers investigate the post-transcriptional regulatory logics coordinated by RNA-binding proteins and translational landscape of cellular RNAs.

Subject(s)

Computational Biology , Databases, Genetic , Gene Expression Regulation , RNA Processing, Post-Transcriptional , Animals , Binding Sites , Computational Biology/methods , Humans , Immunoprecipitation , Molecular Sequence Annotation , Open Reading Frames , Protein Binding , RNA-Binding Proteins/metabolism , Sequence Analysis, DNA , Web Browser

17.

Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder.

Gandal, Michael J; Zhang, Pan; Hadjimichael, Evi; Walker, Rebecca L; Chen, Chao; Liu, Shuang; Won, Hyejung; van Bakel, Harm; Varghese, Merina; Wang, Yongjun; Shieh, Annie W; Haney, Jillian; Parhami, Sepideh; Belmont, Judson; Kim, Minsoo; Moran Losada, Patricia; Khan, Zenab; Mleczko, Justyna; Xia, Yan; Dai, Rujia; Wang, Daifeng; Yang, Yucheng T; Xu, Min; Fish, Kenneth; Hof, Patrick R; Warrell, Jonathan; Fitzgerald, Dominic; White, Kevin; Jaffe, Andrew E; Peters, Mette A; Gerstein, Mark; Liu, Chunyu; Iakoucheva, Lilia M; Pinto, Dalila; Geschwind, Daniel H.

Science ; 362(6420)2018 12 14.

Article in English | MEDLINE | ID: mdl-30545856

ABSTRACT

Most genetic risk for psychiatric disease lies in regulatory regions, implicating pathogenic dysregulation of gene expression and splicing. However, comprehensive assessments of transcriptomic organization in diseased brains are limited. In this work, we integrated genotypes and RNA sequencing in brain samples from 1695 individuals with autism spectrum disorder (ASD), schizophrenia, and bipolar disorder, as well as controls. More than 25% of the transcriptome exhibits differential splicing or expression, with isoform-level changes capturing the largest disease effects and genetic enrichments. Coexpression networks isolate disease-specific neuronal alterations, as well as microglial, astrocyte, and interferon-response modules defining previously unidentified neural-immune mechanisms. We integrated genetic and genomic data to perform a transcriptome-wide association study, prioritizing disease loci likely mediated by cis effects on brain expression. This transcriptome-wide characterization of the molecular pathology across three major psychiatric disorders provides a comprehensive resource for mechanistic insight and therapeutic development.

Subject(s)

Autism Spectrum Disorder/genetics , Bipolar Disorder/genetics , Genetic Predisposition to Disease , RNA Splicing , Schizophrenia/genetics , Brain/metabolism , Humans , Protein Isoforms/genetics , Sequence Analysis, RNA , Transcriptome

18.

Comprehensive functional genomic resource and integrative model for the human brain.

Wang, Daifeng; Liu, Shuang; Warrell, Jonathan; Won, Hyejung; Shi, Xu; Navarro, Fabio C P; Clarke, Declan; Gu, Mengting; Emani, Prashant; Yang, Yucheng T; Xu, Min; Gandal, Michael J; Lou, Shaoke; Zhang, Jing; Park, Jonathan J; Yan, Chengfei; Rhie, Suhn Kyong; Manakongtreecheep, Kasidet; Zhou, Holly; Nathan, Aparna; Peters, Mette; Mattei, Eugenio; Fitzgerald, Dominic; Brunetti, Tonya; Moore, Jill; Jiang, Yan; Girdhar, Kiran; Hoffman, Gabriel E; Kalayci, Selim; Gümüs, Zeynep H; Crawford, Gregory E; Roussos, Panos; Akbarian, Schahram; Jaffe, Andrew E; White, Kevin P; Weng, Zhiping; Sestan, Nenad; Geschwind, Daniel H; Knowles, James A; Gerstein, Mark B.

Science ; 362(6420)2018 12 14.

Article in English | MEDLINE | ID: mdl-30545857

ABSTRACT

Despite progress in defining genetic risk for psychiatric disorders, their molecular mechanisms remain elusive. Addressing this, the PsychENCODE Consortium has generated a comprehensive online resource for the adult brain across 1866 individuals. The PsychENCODE resource contains ~79,000 brain-active enhancers, sets of Hi-C linkages, and topologically associating domains; single-cell expression profiles for many cell types; expression quantitative-trait loci (QTLs); and further QTLs associated with chromatin, splicing, and cell-type proportions. Integration shows that varying cell-type proportions largely account for the cross-population variation in expression (with >88% reconstruction accuracy). It also allows building of a gene regulatory network, linking genome-wide association study variants to genes (e.g., 321 for schizophrenia). We embed this network into an interpretable deep-learning model, which improves disease prediction by ~6-fold versus polygenic risk scores and identifies key genes and pathways in psychiatric disorders.

Subject(s)

Brain/metabolism , Gene Expression Regulation , Mental Disorders/genetics , Datasets as Topic , Deep Learning , Enhancer Elements, Genetic , Epigenesis, Genetic , Epigenomics , Gene Regulatory Networks , Genome-Wide Association Study , Humans , Quantitative Trait Loci , Single-Cell Analysis , Transcriptome

19.

Investigation of RNA-RNA Interactions Using the RISE Database.

Ju, Yanyan; Gong, Jing; Yang, Yucheng T; Zhang, Qiangfeng Cliff.

Curr Protoc Bioinformatics ; 64(1): e58, 2018 12.

Article in English | MEDLINE | ID: mdl-30408350

ABSTRACT

RNA-RNA interactions (RRIs) are essential to understanding the regulatory mechanisms of RNAs. Mapping RRIs in vivo in a transcriptome-wide manner remained challenging until the recent development of several sequencing-based technologies. However, RRIs generated from large-scale studies had not been systematically collected and analyzed before. This article introduces RISE, a database of the RNA Interactome from Sequencing Experiments. RISE provides a comprehensive collection of RRIs in human, mouse, and yeast, derived from transcriptome-wide sequencing experiments, as well as targeted sequencing studies and other public databases/datasets. To facilitate better understanding of the biological roles of these RRIs, RISE also offers rich functional annotations involving RNAs, and an interactive interface to explore the analysis results. Here, we provide a brief description of the RISE website and a step-by-step protocol for using RISE to study RRIs. © 2018 by John Wiley & Sons, Inc.

Subject(s)

Databases, Genetic , RNA/metabolism , Sequence Analysis, RNA , Molecular Sequence Annotation , Mutation/genetics , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, RNA/methods

20.

RISE: a database of RNA interactome from sequencing experiments.

Gong, Jing; Shao, Di; Xu, Kui; Lu, Zhipeng; Lu, Zhi John; Yang, Yucheng T; Zhang, Qiangfeng Cliff.

Nucleic Acids Res ; 46(D1): D194-D201, 2018 01 04.

Article in English | MEDLINE | ID: mdl-29040625

ABSTRACT

We present RISE (http://rise.zhanglab.net), a database of RNA Interactome from Sequencing Experiments. RNA-RNA interactions (RRIs) are essential for RNA regulation and function. RISE provides a comprehensive collection of RRIs that mainly come from recent transcriptome-wide sequencing-based experiments like PARIS, SPLASH, LIGR-seq, and MARIO, as well as targeted studies like RIA-seq, RAP-RNA and CLASH. It also includes interactions aggregated from other primary databases and publications. The RISE database currently contains 328,811 RNA-RNA interactions mainly in human, mouse and yeast. While most existing RNA databases mainly contain interactions of miRNA targeting, notably, more than half of the RRIs in RISE are among mRNA and long non-coding RNAs. We compared different RRI datasets in RISE and found limited overlaps in interactions resolved by different techniques and in different cell lines. It may suggest technology preference and also dynamic natures of RRIs. We also analyzed the basic features of the human and mouse RRI networks and found that they tend to be scale-free, small-world, hierarchical and modular. The analysis may nominate important RNAs or RRIs for further investigation. Finally, RISE provides a Circos plot and several table views for integrative visualization, with extensive molecular and functional annotations to facilitate exploration of biological functions for any RRI of interest.

Subject(s)

Databases, Nucleic Acid , Animals , Gene Regulatory Networks , High-Throughput Nucleotide Sequencing , Humans , Mice , Molecular Sequence Annotation , Protein Interaction Maps , RNA/genetics , RNA/metabolism , Sequence Analysis, RNA , Transcriptome , User-Computer Interface

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL