Search | VHL Regional Portal

1.

Integration of variant annotations using deep set networks boosts rare variant association testing.

Clarke, Brian; Holtkamp, Eva; Öztürk, Hakime; Mück, Marcel; Wahlberg, Magnus; Meyer, Kayla; Munzlinger, Felix; Brechtmann, Felix; Hölzlwimmer, Florian R; Lindner, Jonas; Chen, Zhifen; Gagneur, Julien; Stegle, Oliver.

Nat Genet ; 2024 Sep 25.

Article in English | MEDLINE | ID: mdl-39322779

ABSTRACT

Rare genetic variants can have strong effects on phenotypes, yet accounting for rare variants in genetic analyses is statistically challenging due to the limited number of allele carriers and the burden of multiple testing. While rich variant annotations promise to enable well-powered rare variant association tests, methods integrating variant annotations in a data-driven manner are lacking. Here we propose deep rare variant association testing (DeepRVAT), a model based on set neural networks that learns a trait-agnostic gene impairment score from rare variant annotations and phenotypes, enabling both gene discovery and trait prediction. On 34 quantitative and 63 binary traits, using whole-exome-sequencing data from UK Biobank, we find that DeepRVAT yields substantial gains in gene discoveries and improved detection of individuals at high genetic risk. Finally, we demonstrate how DeepRVAT enables calibrated and computationally efficient rare variant tests at biobank scale, aiding the discovery of genetic risk factors for human disease traits.

2.

Phenotype-driven genomics enhance diagnosis in children with unresolved neuromuscular diseases.

Estévez-Arias, Berta; Matalonga, Leslie; Yubero, Delia; Polavarapu, Kiran; Codina, Anna; Ortez, Carlos; Carrera-García, Laura; Expósito-Escudero, Jesica; Jou, Cristina; Meyer, Stefanie; Kilicarslan, Ozge Aksel; Aleman, Alberto; Thompson, Rachel; Luknárová, Rebeka; Esteve-Codina, Anna; Gut, Marta; Laurie, Steven; Demidov, German; Yépez, Vicente A; Beltran, Sergi; Gagneur, Julien; Topf, Ana; Lochmüller, Hanns; Nascimento, Andres; Hoenicka, Janet; Palau, Francesc; Natera-de Benito, Daniel.

Eur J Hum Genet ; 2024 Sep 27.

Article in English | MEDLINE | ID: mdl-39333429

ABSTRACT

Establishing a molecular diagnosis remains challenging in half of individuals with childhood-onset neuromuscular diseases (NMDs) despite exome sequencing. This study evaluates the diagnostic utility of combining genomic approaches in undiagnosed NMD patients. We performed deep phenotyping of 58 individuals with unsolved childhood-onset NMDs that have previously undergone inconclusive exome studies. Genomic approaches included trio genome sequencing and RNASeq. Genetic diagnoses were reached in 23 out of 58 individuals (40%). Twenty-one individuals carried causal single nucleotide variants (SNVs) or small insertions and deletions, while 2 carried pathogenic structural variants (SVs). Genomic sequencing identified pathogenic variants in coding regions or at the splice site in 17 out of 21 resolved cases, while RNA sequencing was additionally required for the diagnosis of 4 cases. Reasons for previous diagnostic failures included low coverage in exonic regions harboring the second pathogenic variant and involvement of genes that were not yet linked to human diseases at the time of the first NGS analysis. In summary, our systematic genetic analysis, integrating deep phenotyping, trio genome sequencing and RNASeq, proved effective in diagnosing unsolved childhood-onset NMDs. This approach holds promise for similar cohorts, offering potential improvements in diagnostic rates and clinical management of individuals with NMDs.

3.

scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution.

Hingerl, Johannes C; Martens, Laura D; Karollus, Alexander; Manz, Trevor; Buenrostro, Jason D; Theis, Fabian J; Gagneur, Julien.

bioRxiv ; 2024 Sep 23.

Article in English | MEDLINE | ID: mdl-39345504

ABSTRACT

Understanding how regulatory DNA elements shape gene expression across individual cells is a fundamental challenge in genomics. Joint RNA-seq and epigenomic profiling provides opportunities to build unifying models of gene regulation capturing sequence determinants across steps of gene expression. However, current models, developed primarily for bulk omics data, fail to capture the cellular heterogeneity and dynamic processes revealed by single-cell multi-modal technologies. Here, we introduce scooby, the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome from sequence at single-cell resolution. For this, we leverage the pre-trained multi-omics profile predictor Borzoi as a foundation model, equip it with a cell-specific decoder, and fine-tune its sequence embeddings. Specifically, we condition the decoder on the cell position in a precomputed single-cell embedding resulting in strong generalization capability. Applied to a hematopoiesis dataset, scooby recapitulates cell-specific expression levels of held-out genes and cells, and identifies regulators and their putative target genes through in silico motif deletion. Moreover, accurate variant effect prediction with scooby allows for breaking down bulk eQTL effects into single-cell effects and delineating their impact on chromatin accessibility and gene expression. We anticipate scooby to aid unraveling the complexities of gene regulation at the resolution of individual cells.

4.

Splicing control by PHF5A is crucial for melanoma cell survival.

Meißgeier, Tina; Kappelmann-Fenzl, Melanie; Staebler, Sebastian; Ahari, Ata Jadid; Mertes, Christian; Gagneur, Julien; Linck-Paulus, Lisa; Bosserhoff, Anja Katrin.

Cell Prolif ; : e13741, 2024 Aug 30.

Article in English | MEDLINE | ID: mdl-39212334

ABSTRACT

Abnormalities in alternative splicing are a hallmark of cancer formation. In this study, we investigated the role of the splicing factor PHD finger protein 5A (PHF5A) in melanoma. Malignant melanoma is the deadliest form of skin cancer, and patients with a high PHF5A expression show poor overall survival. Our data revealed that an siRNA-mediated downregulation of PHF5A in different melanoma cell lines leads to massive splicing defects of different tumour-relevant genes. The loss of PHF5A results in an increased rate of apoptosis by triggering Fas- and unfolded protein response (UPR)-mediated apoptosis pathways in melanoma cells. These findings are tumour-specific because we did not observe this regulation in fibroblasts. Our study identifies a crucial role of PHF5A as driver for melanoma malignancy and the described underlying splicing network provides an interesting basis for the development of new therapeutic targets for this aggressive form of skin cancer.

5.

An Integrated Transcriptomics and Genomics Approach Detects an X/Autosome Translocation in a Female with Duchenne Muscular Dystrophy.

Segarra-Casas, Alba; Yépez, Vicente A; Demidov, German; Laurie, Steven; Esteve-Codina, Anna; Gagneur, Julien; Parkhurst, Yolande; Muni-Lofra, Robert; Harris, Elizabeth; Marini-Bettolo, Chiara; Straub, Volker; Töpf, Ana.

Int J Mol Sci ; 25(14)2024 Jul 16.

Article in English | MEDLINE | ID: mdl-39063034

ABSTRACT

Duchenne and Becker muscular dystrophies, caused by pathogenic variants in DMD, are the most common inherited neuromuscular conditions in childhood. These diseases follow an X-linked recessive inheritance pattern, and mainly males are affected. The most prevalent pathogenic variants in the DMD gene are copy number variants (CNVs), and most patients achieve their genetic diagnosis through Multiplex Ligation-dependent Probe Amplification (MLPA) or exome sequencing. Here, we investigated a female patient presenting with muscular dystrophy who remained genetically undiagnosed after MLPA and exome sequencing. RNA sequencing (RNAseq) from the patient's muscle biopsy identified an 85% reduction in DMD expression compared to 116 muscle samples included in the cohort. A de novo balanced translocation between chromosome 17 and the X chromosome (t(X;17)(p21.1;q23.2)) disrupting the DMD and BCAS3 genes was identified through trio whole genome sequencing (WGS). The combined analysis of RNAseq and WGS played a crucial role in the detection and characterisation of the disease-causing variant in this patient, who had been undiagnosed for over two decades. This case illustrates the diagnostic odyssey of female DMD patients with complex structural variants that are not detected by current panel or exome sequencing analysis.

Subject(s)

Chromosomes, Human, X , Dystrophin , Genomics , Muscular Dystrophy, Duchenne , Translocation, Genetic , Humans , Muscular Dystrophy, Duchenne/genetics , Muscular Dystrophy, Duchenne/diagnosis , Female , Dystrophin/genetics , Chromosomes, Human, X/genetics , Genomics/methods , DNA Copy Number Variations , Exome Sequencing , Transcriptome/genetics , Chromosomes, Human, Pair 17/genetics

6.

Distinct genetic liability profiles define clinically relevant patient strata across common diseases.

Trastulla, Lucia; Dolgalev, Georgii; Moser, Sylvain; Jiménez-Barrón, Laura T; Andlauer, Till F M; von Scheidt, Moritz; Budde, Monika; Heilbronner, Urs; Papiol, Sergi; Teumer, Alexander; Homuth, Georg; Völzke, Henry; Dörr, Marcus; Falkai, Peter; Schulze, Thomas G; Gagneur, Julien; Iorio, Francesco; Müller-Myhsok, Bertram; Schunkert, Heribert; Ziller, Michael J.

Nat Commun ; 15(1): 5534, 2024 Jul 01.

Article in English | MEDLINE | ID: mdl-38951512

ABSTRACT

Stratified medicine holds great promise to tailor treatment to the needs of individual patients. While genetics holds great potential to aid patient stratification, it remains a major challenge to operationalize complex genetic risk factor profiles to deconstruct clinical heterogeneity. Contemporary approaches to this problem rely on polygenic risk scores (PRS), which provide only limited clinical utility and lack a clear biological foundation. To overcome these limitations, we develop the CASTom-iGEx approach to stratify individuals based on the aggregated impact of their genetic risk factor profiles on tissue specific gene expression levels. The paradigmatic application of this approach to coronary artery disease or schizophrenia patient cohorts identified diverse strata or biotypes. These biotypes are characterized by distinct endophenotype profiles as well as clinical parameters and are fundamentally distinct from PRS based groupings. In stark contrast to the latter, the CASTom-iGEx strategy discovers biologically meaningful and clinically actionable patient subgroups, where complex genetic liabilities are not randomly distributed across individuals but rather converge onto distinct disease relevant biological processes. These results support the notion of different patient biotypes characterized by partially distinct pathomechanisms. Thus, the universally applicable approach presented here has the potential to constitute an important component of future personalized medicine paradigms.

Subject(s)

Coronary Artery Disease , Genetic Predisposition to Disease , Multifactorial Inheritance , Schizophrenia , Humans , Schizophrenia/genetics , Multifactorial Inheritance/genetics , Genetic Predisposition to Disease/genetics , Coronary Artery Disease/genetics , Risk Factors , Female , Precision Medicine , Male , Genome-Wide Association Study , Middle Aged , Polymorphism, Single Nucleotide

7.

Identifying dysregulated regions in amyotrophic lateral sclerosis through chromatin accessibility outliers.

Çelik, Muhammed Hasan; Gagneur, Julien; Lim, Ryan G; Wu, Jie; Thompson, Leslie M; Xie, Xiaohui.

HGG Adv ; 5(3): 100318, 2024 Jul 18.

Article in English | MEDLINE | ID: mdl-38872308

ABSTRACT

The high heritability of amyotrophic lateral sclerosis (ALS) contrasts with its low molecular diagnosis rate post-genetic testing, pointing to potential undiscovered genetic factors. To aid the exploration of these factors, we introduced EpiOut, an algorithm to identify chromatin accessibility outliers that are regions exhibiting divergent accessibility from the population baseline in a single or few samples. Annotation of accessible regions with histone chromatin immunoprecipitation sequencing and Hi-C indicates that outliers are concentrated in functional loci, especially among promoters interacting with active enhancers. Across different omics levels, outliers are robustly replicated, and chromatin accessibility outliers are reliable predictors of gene expression outliers and aberrant protein levels. When promoter accessibility does not align with gene expression, our results indicate that molecular aberrations are more likely to be linked to post-transcriptional regulation rather than transcriptional regulation. Our findings demonstrate that the outlier detection paradigm can uncover dysregulated regions in rare diseases. EpiOut is available at github.com/uci-cbcl/EpiOut.

Subject(s)

Amyotrophic Lateral Sclerosis , Chromatin , Amyotrophic Lateral Sclerosis/genetics , Amyotrophic Lateral Sclerosis/metabolism , Humans , Chromatin/metabolism , Chromatin/genetics , Promoter Regions, Genetic/genetics , Algorithms , Gene Expression Regulation , Chromatin Immunoprecipitation Sequencing , Histones/metabolism , Histones/genetics

8.

Genome and RNA sequencing were essential to reveal cryptic intronic variants associated to defective ATP6AP1 mRNA processing.

Morales-Romero, Blai; Muñoz-Pujol, Gerard; Artuch, Rafael; García-Cazorla, Angels; O'Callaghan, Mar; Sykut-Cegielska, Jolanta; Campistol, Jaume; Moreno-Lozano, Pedro Juan; Oud, Machteld M; Wevers, Ron A; Lefeber, Dirk J; Esteve-Codina, Anna; Yepez, Vicente A; Gagneur, Julien; Wortmann, Saskia B; Prokisch, Holger; Ribes, Antonia; García-Villoria, Judit; Tort, Frederic.

Mol Genet Metab ; 142(3): 108511, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38878498

ABSTRACT

The diagnosis of Mendelian disorders has notably advanced with integration of whole exome and genome sequencing (WES and WGS) in clinical practice. However, challenges in variant interpretation and uncovered variants by WES still leave a substantial percentage of patients undiagnosed. In this context, integrating RNA sequencing (RNA-seq) improves diagnostic workflows, particularly for WES inconclusive cases. Additionally, functional studies are often necessary to elucidate the impact of prioritized variants on gene expression and protein function. Our study focused on three unrelated male patients (P1-P3) with ATP6AP1-CDG (congenital disorder of glycosylation), presenting with intellectual disability and varying degrees of hepatopathy, glycosylation defects, and an initially inconclusive diagnosis through WES. Subsequent RNA-seq was pivotal in identifying the underlying genetic causes in P1 and P2, detecting ATP6AP1 underexpression and aberrant splicing. Molecular studies in fibroblasts confirmed these findings and identified the rare intronic variants c.289-233C > T and c.289-289G > A in P1 and P2, respectively. Trio-WGS also revealed the variant c.289-289G > A in P3, which was a de novo change in both patients. Functional assays expressing the mutant alleles in HAP1 cells demonstrated the pathogenic impact of these variants by reproducing the splicing alterations observed in patients. Our study underscores the role of RNA-seq and WGS in enhancing diagnostic rates for genetic diseases such as CDG, providing new insights into ATP6AP1-CDG molecular bases by identifying the first two deep intronic variants in this X-linked gene. Additionally, our study highlights the need to integrate RNA-seq and WGS, followed by functional validation, in routine diagnostics for a comprehensive evaluation of patients with an unidentified molecular etiology.

Subject(s)

Introns , RNA, Messenger , Humans , Male , Introns/genetics , RNA, Messenger/genetics , Vacuolar Proton-Translocating ATPases/genetics , Congenital Disorders of Glycosylation/genetics , Congenital Disorders of Glycosylation/diagnosis , Congenital Disorders of Glycosylation/pathology , Mutation , Whole Genome Sequencing , Exome Sequencing , Sequence Analysis, RNA , Intellectual Disability/genetics , Intellectual Disability/diagnosis , Intellectual Disability/pathology , Child , RNA Splicing/genetics , Child, Preschool

9.

Analysis of 3760 hematologic malignancies reveals rare transcriptomic aberrations of driver genes.

Cao, Xueqi; Huber, Sandra; Ahari, Ata Jadid; Traube, Franziska R; Seifert, Marc; Oakes, Christopher C; Secheyko, Polina; Vilov, Sergey; Scheller, Ines F; Wagner, Nils; Yépez, Vicente A; Blombery, Piers; Haferlach, Torsten; Heinig, Matthias; Wachutka, Leonhard; Hutter, Stephan; Gagneur, Julien.

Genome Med ; 16(1): 70, 2024 05 20.

Article in English | MEDLINE | ID: mdl-38769532

ABSTRACT

BACKGROUND: Rare oncogenic driver events, particularly affecting the expression or splicing of driver genes, are suspected to substantially contribute to the large heterogeneity of hematologic malignancies. However, their identification remains challenging. METHODS: To address this issue, we generated the largest dataset to date of matched whole genome sequencing and total RNA sequencing of hematologic malignancies from 3760 patients spanning 24 disease entities. Taking advantage of our dataset size, we focused on discovering rare regulatory aberrations. Therefore, we called expression and splicing outliers using an extension of the workflow DROP (Detection of RNA Outliers Pipeline) and AbSplice, a variant effect predictor that identifies genetic variants causing aberrant splicing. We next trained a machine learning model integrating these results to prioritize new candidate disease-specific driver genes. RESULTS: We found a median of seven expression outlier genes, two splicing outlier genes, and two rare splice-affecting variants per sample. Each category showed significant enrichment for already well-characterized driver genes, with odds ratios exceeding three among genes called in more than five samples. On held-out data, our integrative modeling significantly outperformed modeling based solely on genomic data and revealed promising novel candidate driver genes. Remarkably, we found a truncated form of the low density lipoprotein receptor LRP1B transcript to be aberrantly overexpressed in about half of hairy cell leukemia variant (HCL-V) samples and, to a lesser extent, in closely related B-cell neoplasms. This observation, which was confirmed in an independent cohort, suggests LRP1B as a novel marker for a HCL-V subclass and a yet unreported functional role of LRP1B within these rare entities. CONCLUSIONS: Altogether, our census of expression and splicing outliers for 24 hematologic malignancy entities and the companion computational workflow constitute unique resources to deepen our understanding of rare oncogenic events in hematologic cancers.

Subject(s)

Hematologic Neoplasms , Transcriptome , Humans , Hematologic Neoplasms/genetics , RNA Splicing , Gene Expression Regulation, Neoplastic , Oncogenes , Gene Expression Profiling , Receptors, LDL/genetics

10.

Unravelling undiagnosed rare disease cases by HiFi long-read genome sequencing.

Steyaert, Wouter; Sagath, Lydia; Demidov, German; Yépez, Vicente A; Esteve-Codina, Anna; Gagneur, Julien; Ellwanger, Kornelia; Derks, Ronny; Weiss, Marjan; den Ouden, Amber; van den Heuvel, Simone; Swinkels, Hilde; Zomer, Nick; Steehouwer, Marloes; O'Gorman, Luke; Astuti, Galuh; Neveling, Kornelia; Schüle, Rebecca; Xu, Jishu; Synofzik, Matthis; Beijer, Danique; Hengel, Holger; Schöls, Ludger; Claeys, Kristl G; Baets, Jonathan; Van de Vondel, Liedewei; Ferlini, Alessandra; Selvatici, Rita; Morsy, Heba; Saeed Abd Elmaksoud, Marwa; Straub, Volker; Müller, Juliane; Pini, Veronica; Perry, Luke; Sarkozy, Anna; Zaharieva, Irina; Muntoni, Francesco; Bugiardini, Enrico; Polavarapu, Kiran; Horvath, Rita; Reid, Evan; Lochmüller, Hanns; Spinazzi, Marco; Savarese, Marco; Matalonga, Leslie; Laurie, Steven; Brunner, Han G; Graessner, Holm; Beltran, Sergi; Ossowski, Stephan.

medRxiv ; 2024 May 04.

Article in English | MEDLINE | ID: mdl-38746462

ABSTRACT

Solve-RD is a pan-European rare disease (RD) research program that aims to identify disease-causing genetic variants in previously undiagnosed RD families. We utilised 10-fold coverage HiFi long-read sequencing (LRS) for detecting causative structural variants (SVs), single nucleotide variants (SNVs), insertion-deletions (InDels), and short tandem repeat (STR) expansions in extensively studied RD families without clear molecular diagnoses. Our cohort includes 293 individuals from 114 genetically undiagnosed RD families selected by European Rare Disease Network (ERN) experts. Of these, 21 families were affected by so-called 'unsolvable' syndromes for which genetic causes remain unknown, and 93 families with at least one individual affected by a rare neurological, neuromuscular, or epilepsy disorder without genetic diagnosis despite extensive prior testing. Clinical interpretation and orthogonal validation of variants in known disease genes yielded thirteen novel genetic diagnoses due to de novo and rare inherited SNVs, InDels, SVs, and STR expansions. In an additional four families, we identified a candidate disease-causing SV affecting several genes including an MCF2 / FGF13 fusion and PSMA3 deletion. However, no common genetic cause was identified in any of the 'unsolvable' syndromes. Taken together, we found (likely) disease-causing genetic variants in 13.0% of previously unsolved families and additional candidate disease-causing SVs in another 4.3% of these families. In conclusion, our results demonstrate the added value of HiFi long-read genome sequencing in undiagnosed rare diseases.

11.

Species-aware DNA language models capture regulatory elements and their evolution.

Karollus, Alexander; Hingerl, Johannes; Gankin, Dennis; Grosshauser, Martin; Klemon, Kristian; Gagneur, Julien.

Genome Biol ; 25(1): 83, 2024 04 02.

Article in English | MEDLINE | ID: mdl-38566111

ABSTRACT

BACKGROUND: The rise of large-scale multi-species genome sequencing projects promises to shed new light on how genomes encode gene regulatory instructions. To this end, new algorithms are needed that can leverage conservation to capture regulatory elements while accounting for their evolution. RESULTS: Here, we introduce species-aware DNA language models, which we trained on more than 800 species spanning over 500 million years of evolution. Investigating their ability to predict masked nucleotides from context, we show that DNA language models distinguish transcription factor and RNA-binding protein motifs from background non-coding sequence. Owing to their flexibility, DNA language models capture conserved regulatory elements over much further evolutionary distances than sequence alignment would allow. Remarkably, DNA language models reconstruct motif instances bound in vivo better than unbound ones and account for the evolution of motif sequences and their positional constraints, showing that these models capture functional high-order sequence and evolutionary context. We further show that species-aware training yields improved sequence representations for endogenous and MPRA-based gene expression prediction, as well as motif discovery. CONCLUSIONS: Collectively, these results demonstrate that species-aware DNA language models are a powerful, flexible, and scalable tool to integrate information from large compendia of highly diverged genomes.

Subject(s)

DNA , Regulatory Sequences, Nucleic Acid , Binding Sites , Sequence Alignment , Algorithms , Conserved Sequence/genetics , Evolution, Molecular

12.

Cellular energy regulates mRNA degradation in a codon-specific manner.

Tomaz da Silva, Pedro; Zhang, Yujie; Theodorakis, Evangelos; Martens, Laura D; Yépez, Vicente A; Pelechano, Vicent; Gagneur, Julien.

Mol Syst Biol ; 20(5): 506-520, 2024 May.

Article in English | MEDLINE | ID: mdl-38491213

ABSTRACT

Codon optimality is a major determinant of mRNA translation and degradation rates. However, whether and through which mechanisms its effects are regulated remains poorly understood. Here we show that codon optimality associates with up to 2-fold change in mRNA stability variations between human tissues, and that its effect is attenuated in tissues with high energy metabolism and amplifies with age. Mathematical modeling and perturbation data through oxygen deprivation and ATP synthesis inhibition reveal that cellular energy variations non-uniformly alter the effect of codon usage. This new mode of codon effect regulation, independent of tRNA regulation, provides a fundamental mechanistic link between cellular energy metabolism and eukaryotic gene expression.

Subject(s)

Codon , Energy Metabolism , RNA Stability , RNA, Messenger , Humans , Energy Metabolism/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Codon/genetics , Codon Usage , Protein Biosynthesis , RNA, Transfer/genetics , RNA, Transfer/metabolism , Adenosine Triphosphate/metabolism , Gene Expression Regulation

13.

Viral genome sequencing to decipher in-hospital SARS-CoV-2 transmission events.

Esser, Elisabeth; Schulte, Eva C; Graf, Alexander; Karollus, Alexander; Smith, Nicholas H; Michler, Thomas; Dvoretskii, Stefan; Angelov, Angel; Sonnabend, Michael; Peter, Silke; Engesser, Christina; Radonic, Aleksandar; Thürmer, Andrea; von Kleist, Max; Gebhardt, Friedemann; da Costa, Clarissa Prazeres; Busch, Dirk H; Muenchhoff, Maximilian; Blum, Helmut; Keppler, Oliver T; Gagneur, Julien; Protzer, Ulrike.

Sci Rep ; 14(1): 5768, 2024 03 08.

Article in English | MEDLINE | ID: mdl-38459123

ABSTRACT

The SARS-CoV-2 pandemic has highlighted the need to better define in-hospital transmissions, a need that extends to all other common infectious diseases encountered in clinical settings. To evaluate how whole viral genome sequencing can contribute to deciphering nosocomial SARS-CoV-2 transmission 926 SARS-CoV-2 viral genomes from 622 staff members and patients were collected between February 2020 and January 2021 at a university hospital in Munich, Germany, and analysed along with the place of work, duration of hospital stay, and ward transfers. Bioinformatically defined transmission clusters inferred from viral genome sequencing were compared to those inferred from interview-based contact tracing. An additional dataset collected at the same time at another university hospital in the same city was used to account for multiple independent introductions. Clustering analysis of 619 viral genomes generated 19 clusters ranging from 3 to 31 individuals. Sequencing-based transmission clusters showed little overlap with those based on contact tracing data. The viral genomes were significantly more closely related to each other than comparable genomes collected simultaneously at other hospitals in the same city (n = 829), suggesting nosocomial transmission. Longitudinal sampling from individual patients suggested possible cross-infection events during the hospital stay in 19.2% of individuals (14 of 73 individuals). Clustering analysis of SARS-CoV-2 whole genome sequences can reveal cryptic transmission events missed by classical, interview-based contact tracing, helping to decipher in-hospital transmissions. These results, in line with other studies, advocate for viral genome sequencing as a pathogen transmission surveillance tool in hospitals.

Subject(s)

COVID-19 , Cross Infection , Humans , SARS-CoV-2/genetics , COVID-19/epidemiology , COVID-19/genetics , Genome, Viral/genetics , Cross Infection/epidemiology , Cross Infection/genetics , Hospitals, University

14.

Impaired biogenesis of basic proteins impacts multiple hallmarks of the aging brain.

Di Fraia, Domenico; Marino, Antonio; Lee, Jae Ho; Kelmer Sacramento, Erika; Baumgart, Mario; Bagnoli, Sara; Tomaz da Silva, Pedro; Kumar Sahu, Amit; Siano, Giacomo; Tiessen, Max; Terzibasi-Tozzini, Eva; Gagneur, Julien; Frydman, Judith; Cellerino, Alessandro; Ori, Alessandro.

bioRxiv ; 2024 Jan 09.

Article in English | MEDLINE | ID: mdl-38260253

ABSTRACT

Aging and neurodegeneration entail diverse cellular and molecular hallmarks. Here, we studied the effects of aging on the transcriptome, translatome, and multiple layers of the proteome in the brain of a short-lived killifish. We reveal that aging causes widespread reduction of proteins enriched in basic amino acids that is independent of mRNA regulation, and it is not due to impaired proteasome activity. Instead, we identify a cascade of events where aberrant translation pausing leads to reduced ribosome availability resulting in proteome remodeling independently of transcriptional regulation. Our research uncovers a vulnerable point in the aging brain's biology - the biogenesis of basic DNA/RNA binding proteins. This vulnerability may represent a unifying principle that connects various aging hallmarks, encompassing genome integrity and the biosynthesis of macromolecules.

15.

Deep learning-driven fragment ion series classification enables highly precise and sensitive de novo peptide sequencing.

Klaproth-Andrade, Daniela; Hingerl, Johannes; Bruns, Yanik; Smith, Nicholas H; Träuble, Jakob; Wilhelm, Mathias; Gagneur, Julien.

Nat Commun ; 15(1): 151, 2024 Jan 02.

Article in English | MEDLINE | ID: mdl-38167372

ABSTRACT

Unlike for DNA and RNA, accurate and high-throughput sequencing methods for proteins are lacking, hindering the utility of proteomics in applications where the sequences are unknown including variant calling, neoepitope identification, and metaproteomics. We introduce Spectralis, a de novo peptide sequencing method for tandem mass spectrometry. Spectralis leverages several innovations including a convolutional neural network layer connecting peaks in spectra spaced by amino acid masses, proposing fragment ion series classification as a pivotal task for de novo peptide sequencing, and a peptide-spectrum confidence score. On spectra for which database search provided a ground truth, Spectralis surpassed 40% sensitivity at 90% precision, nearly doubling state-of-the-art sensitivity. Application to unidentified spectra confirmed its superiority and showcased its applicability to variant calling. Altogether, these algorithmic innovations and the substantial sensitivity increase in the high-precision range constitute an important step toward broadly applicable peptide sequencing.

Subject(s)

Deep Learning , Algorithms , Sequence Analysis, Protein/methods , Peptides/chemistry , Amino Acid Sequence

16.

Modeling fragment counts improves single-cell ATAC-seq analysis.

Martens, Laura D; Fischer, David S; Yépez, Vicente A; Theis, Fabian J; Gagneur, Julien.

Nat Methods ; 21(1): 28-31, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38049697

ABSTRACT

Single-cell ATAC sequencing coverage in regulatory regions is typically binarized as an indicator of open chromatin. Here we show that binarization is an unnecessary step that neither improves goodness of fit, clustering, cell type identification nor batch integration. Fragment counts, but not read counts, should instead be modeled, which preserves quantitative regulatory information. These results have immediate implications for single-cell ATAC sequencing analysis.

Subject(s)

Chromatin Immunoprecipitation Sequencing , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Chromatin/genetics , Single-Cell Analysis

17.

Improved detection of aberrant splicing with FRASER 2.0 and the intron Jaccard index.

Scheller, Ines F; Lutz, Karoline; Mertes, Christian; Yépez, Vicente A; Gagneur, Julien.

Am J Hum Genet ; 110(12): 2056-2067, 2023 Dec 07.

Article in English | MEDLINE | ID: mdl-38006880

ABSTRACT

Detection of aberrantly spliced genes is an important step in RNA-seq-based rare-disease diagnostics. We recently developed FRASER, a denoising autoencoder-based method that outperformed alternative methods of detecting aberrant splicing. However, because FRASER's three splice metrics are partially redundant and tend to be sensitive to sequencing depth, we introduce here a more robust intron-excision metric, the intron Jaccard index, that combines the alternative donor, alternative acceptor, and intron-retention signal into a single value. Moreover, we optimized model parameters and filter cutoffs by using candidate rare-splice-disrupting variants as independent evidence. On 16,213 GTEx samples, our improved algorithm, FRASER 2.0, called typically 10 times fewer splicing outliers while increasing the proportion of candidate rare-splice-disrupting variants by 10-fold and substantially decreasing the effect of sequencing depth on the number of reported outliers. To lower the multiple-testing correction burden, we introduce an option to select the genes to be tested for each sample instead of a transcriptome-wide approach. This option can be particularly useful when prior information, such as candidate variants or genes, is available. Application on 303 rare-disease samples confirmed the relative reduction in the number of outlier calls for a slight loss of sensitivity; FRASER 2.0 recovered 22 out of 26 previously identified pathogenic splicing cases with default cutoffs and 24 when multiple-testing correction was limited to OMIM genes containing rare variants. Altogether, these methodological improvements contribute to more effective RNA-seq-based rare diagnostics by drastically reducing the amount of splicing outlier calls per sample at minimal loss of sensitivity.

Subject(s)

Alternative Splicing , RNA Splicing , Humans , Alternative Splicing/genetics , Introns/genetics , RNA Splicing/genetics , RNA-Seq , Algorithms

18.

Evaluation of input data modality choices on functional gene embeddings.

Brechtmann, Felix; Bechtler, Thibault; Londhe, Shubhankar; Mertes, Christian; Gagneur, Julien.

NAR Genom Bioinform ; 5(4): lqad095, 2023 Dec.

Article in English | MEDLINE | ID: mdl-37942285

ABSTRACT

Functional gene embeddings, numerical vectors capturing gene function, provide a promising way to integrate functional gene information into machine learning models. These embeddings are learnt by applying self-supervised machine-learning algorithms on various data types including quantitative omics measurements, protein-protein interaction networks and literature. However, downstream evaluations comparing alternative data modalities used to construct functional gene embeddings have been lacking. Here we benchmarked functional gene embeddings obtained from various data modalities for predicting disease-gene lists, cancer drivers, phenotype-gene associations and scores from genome-wide association studies. Off-the-shelf predictors trained on precomputed embeddings matched or outperformed dedicated state-of-the-art predictors, demonstrating their high utility. Embeddings based on literature and protein-protein interactions inferred from low-throughput experiments outperformed embeddings derived from genome-wide experimental data (transcriptomics, deletion screens and protein sequence) when predicting curated gene lists. In contrast, they did not perform better when predicting genome-wide association signals and were biased towards highly-studied genes. These results indicate that embeddings derived from literature and low-throughput experiments appear favourable in many existing benchmarks because they are biased towards well-studied genes and should therefore be considered with caution. Altogether, our study and precomputed embeddings will facilitate the development of machine-learning models in genetics and related fields.

19.

Towards in silico CLIP-seq: predicting protein-RNA interaction via sequence-to-signal learning.

Horlacher, Marc; Wagner, Nils; Moyon, Lambert; Kuret, Klara; Goedert, Nicolas; Salvatore, Marco; Ule, Jernej; Gagneur, Julien; Winther, Ole; Marsico, Annalisa.

Genome Biol ; 24(1): 180, 2023 08 04.

Article in English | MEDLINE | ID: mdl-37542318

ABSTRACT

We present RBPNet, a novel deep learning method, which predicts CLIP-seq crosslink count distribution from RNA sequence at single-nucleotide resolution. By training on up to a million regions, RBPNet achieves high generalization on eCLIP, iCLIP and miCLIP assays, outperforming state-of-the-art classifiers. RBPNet performs bias correction by modeling the raw signal as a mixture of the protein-specific and background signal. Through model interrogation via Integrated Gradients, RBPNet identifies predictive sub-sequences that correspond to known and novel binding motifs and enables variant-impact scoring via in silico mutagenesis. Together, RBPNet improves imputation of protein-RNA interactions, as well as mechanistic interpretation of predictions.

Subject(s)

Base Sequence , Computer Simulation , Deep Learning , RNA-Binding Proteins , RNA , Humans , Alleles , Bias , Binding Sites , Consensus Sequence , Datasets as Topic , Internet , Mutation , Nucleotide Motifs , Nucleotides/metabolism , RNA/chemistry , RNA/genetics , RNA/metabolism , RNA Splice Sites , RNA, Messenger/chemistry , RNA, Messenger/genetics , RNA, Messenger/metabolism , RNA, Viral/chemistry , RNA, Viral/genetics , RNA, Viral/metabolism , RNA-Binding Proteins/chemistry , RNA-Binding Proteins/metabolism

20.

Aberrant splicing prediction across human tissues.

Wagner, Nils; Çelik, Muhammed H; Hölzlwimmer, Florian R; Mertes, Christian; Prokisch, Holger; Yépez, Vicente A; Gagneur, Julien.

Nat Genet ; 55(5): 861-870, 2023 05.

Article in English | MEDLINE | ID: mdl-37142848

ABSTRACT

Aberrant splicing is a major cause of genetic disorders but its direct detection in transcriptomes is limited to clinically accessible tissues such as skin or body fluids. While DNA-based machine learning models can prioritize rare variants for affecting splicing, their performance in predicting tissue-specific aberrant splicing remains unassessed. Here we generated an aberrant splicing benchmark dataset, spanning over 8.8 million rare variants in 49 human tissues from the Genotype-Tissue Expression (GTEx) dataset. At 20% recall, state-of-the-art DNA-based models achieve maximum 12% precision. By mapping and quantifying tissue-specific splice site usage transcriptome-wide and modeling isoform competition, we increased precision by threefold at the same recall. Integrating RNA-sequencing data of clinically accessible tissues into our model, AbSplice, brought precision to 60%. These results, replicated in two independent cohorts, substantially contribute to noncoding loss-of-function variant identification and to genetic diagnostics design and analytics.

Subject(s)

Alternative Splicing , RNA Splicing , Humans , RNA Splicing/genetics , Alternative Splicing/genetics , Sequence Analysis, RNA/methods , Transcriptome , Protein Isoforms

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL