Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 141
Filter
1.
Nat Genet ; 2024 May 28.
Article in English | MEDLINE | ID: mdl-38806714

ABSTRACT

The functional impact and cellular context of mosaic structural variants (mSVs) in normal tissues is understudied. Utilizing Strand-seq, we sequenced 1,133 single-cell genomes from 19 human donors of increasing age, and discovered the heterogeneous mSV landscapes of hematopoietic stem and progenitor cells. While mSVs are continuously acquired throughout life, expanded subclones in our cohort are confined to individuals >60. Cells already harboring mSVs are more likely to acquire additional somatic structural variants, including megabase-scale segmental aneuploidies. Capitalizing on comprehensive single-cell micrococcal nuclease digestion with sequencing reference data, we conducted high-resolution cell-typing for eight hematopoietic stem and progenitor cells. Clonally expanded mSVs disrupt normal cellular function by dysregulating diverse cellular pathways, and enriching for myeloid progenitors. Our findings underscore the contribution of mSVs to the cellular and molecular phenotypes associated with the aging hematopoietic system, and establish a foundation for deciphering the molecular links between mSVs, aging and disease susceptibility in normal tissues.

2.
bioRxiv ; 2024 Apr 20.
Article in English | MEDLINE | ID: mdl-38659906

ABSTRACT

Structural variants (SVs) contribute significantly to human genetic diversity and disease 1-4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5-7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.

3.
bioRxiv ; 2023 Oct 03.
Article in English | MEDLINE | ID: mdl-37873367

ABSTRACT

Background: The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a type of complex genomic rearrangement (CGR) hypothesized to result from replicative repair of DNA due to replication fork collapse. It is often mediated by a pair of inverted low-copy repeats (LCR) followed by iterative template switches resulting in at least two breakpoint junctions in cis . Although it has been identified as an important mutation signature of pathogenicity for genomic disorders and cancer genomes, its architecture remains unresolved and is predicted to display at least four structural variation (SV) haplotypes. Results: Here we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the genomic DNA of 24 patients with neurodevelopmental disorders identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted SV haplotypes. Using a combination of short-read genome sequencing (GS), long- read GS, optical genome mapping and StrandSeq the haplotype structure was resolved in 18 samples. This approach refined the point of template switching between inverted LCRs in 4 samples revealing a DNA segment of ∼2.2-5.5 kb of 100% nucleotide similarity. A prediction model was developed to infer the LCR used to mediate the non-allelic homology repair. Conclusions: These data provide experimental evidence supporting the hypothesis that inverted LCRs act as a recombinant substrate in replication-based repair mechanisms. Such inverted repeats are particularly relevant for formation of copy-number associated inversions, including the DUP-TRP/INV-DUP structures. Moreover, this type of CGR can result in multiple conformers which contributes to generate diverse SV haplotypes in susceptible loci .

4.
Nature ; 621(7978): 355-364, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37612510

ABSTRACT

The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.


Subject(s)
Chromosomes, Human, Y , Evolution, Molecular , Humans , Male , Chromosomes, Human, Y/genetics , Genome, Human/genetics , Genomics , Mutation Rate , Phenotype , Euchromatin/genetics , Pseudogenes , Genetic Variation/genetics , Chromosomes, Human, X/genetics , Pseudoautosomal Regions/genetics
5.
Genome Med ; 15(1): 47, 2023 Jul 07.
Article in English | MEDLINE | ID: mdl-37420249

ABSTRACT

BACKGROUND: Cancer genome sequencing enables accurate classification of tumours and tumour subtypes. However, prediction performance is still limited using exome-only sequencing and for tumour types with low somatic mutation burden such as many paediatric tumours. Moreover, the ability to leverage deep representation learning in discovery of tumour entities remains unknown. METHODS: We introduce here Mutation-Attention (MuAt), a deep neural network to learn representations of simple and complex somatic alterations for prediction of tumour types and subtypes. In contrast to many previous methods, MuAt utilizes the attention mechanism on individual mutations instead of aggregated mutation counts. RESULTS: We trained MuAt models on 2587 whole cancer genomes (24 tumour types) from the Pan-Cancer Analysis of Whole Genomes (PCAWG) and 7352 cancer exomes (20 types) from the Cancer Genome Atlas (TCGA). MuAt achieved prediction accuracy of 89% for whole genomes and 64% for whole exomes, and a top-5 accuracy of 97% and 90%, respectively. MuAt models were found to be well-calibrated and perform well in three independent whole cancer genome cohorts with 10,361 tumours in total. We show MuAt to be able to learn clinically and biologically relevant tumour entities including acral melanoma, SHH-activated medulloblastoma, SPOP-associated prostate cancer, microsatellite instability, POLE proofreading deficiency, and MUTYH-associated pancreatic endocrine tumours without these tumour subtypes and subgroups being provided as training labels. Finally, scrunity of MuAt attention matrices revealed both ubiquitous and tumour-type specific patterns of simple and complex somatic mutations. CONCLUSIONS: Integrated representations of somatic alterations learnt by MuAt were able to accurately identify histological tumour types and identify tumour entities, with potential to impact precision cancer medicine.


Subject(s)
Mutation , Neoplasms , Neoplasms/genetics , Neoplasms/pathology , Humans , Deep Learning , Benchmarking
6.
Genome Res ; 33(4): 496-510, 2023 04.
Article in English | MEDLINE | ID: mdl-37164484

ABSTRACT

There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.


Subject(s)
DNA, Satellite , Polymorphism, Genetic , Humans , DNA, Satellite/genetics , Haplotypes , Segmental Duplications, Genomic , Sequence Analysis, DNA
7.
Cell Genom ; 3(4): 100281, 2023 Apr 12.
Article in English | MEDLINE | ID: mdl-37082141

ABSTRACT

Cancer genomes harbor a broad spectrum of structural variants (SVs) driving tumorigenesis, a relevant subset of which escape discovery using short-read sequencing. We employed Oxford Nanopore Technologies (ONT) long-read sequencing in a paired diagnostic and post-therapy medulloblastoma to unravel the haplotype-resolved somatic genetic and epigenetic landscape. We assembled complex rearrangements, including a 1.55-Mbp chromothripsis event, and we uncover a complex SV pattern termed templated insertion (TI) thread, characterized by short (mostly <1 kb) insertions showing prevalent self-concatenation into highly amplified structures of up to 50 kbp in size. TI threads occur in 3% of cancers, with a prevalence up to 74% in liposarcoma, and frequent colocalization with chromothripsis. We also perform long-read-based methylome profiling and discover allele-specific methylation (ASM) effects, complex rearrangements exhibiting differential methylation, and differential promoter methylation in cancer-driver genes. Our study shows the advantage of long-read sequencing in the discovery and characterization of complex somatic rearrangements.

8.
Genome Biol ; 24(1): 100, 2023 04 30.
Article in English | MEDLINE | ID: mdl-37122002

ABSTRACT

The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. To understand its impact on inversion polymorphisms, we remapped data from 41 genomes against the T2T reference genome and compared it to the GRCh38 reference. We find a ~ 21% increase in sensitivity improving mapping of 63 inversions on the T2T reference. We identify 26 misorientations within GRCh38 and show that the T2T reference is three times more likely to represent the correct orientation of the major human allele. Analysis of 10 additional samples reveals novel rare inversions at chromosomes 15q25.2, 16p11.2, 16q22.1-23.1, and 22q11.21.


Subject(s)
Genome, Human , Polymorphism, Genetic , Humans , Genomic Structural Variation , Chromosome Inversion
15.
Nat Biotechnol ; 41(6): 832-844, 2023 06.
Article in English | MEDLINE | ID: mdl-36424487

ABSTRACT

Somatic structural variants (SVs) are widespread in cancer, but their impact on disease evolution is understudied due to a lack of methods to directly characterize their functional consequences. We present a computational method, scNOVA, which uses Strand-seq to perform haplotype-aware integration of SV discovery and molecular phenotyping in single cells by using nucleosome occupancy to infer gene expression as a readout. Application to leukemias and cell lines identifies local effects of copy-balanced rearrangements on gene deregulation, and consequences of SVs on aberrant signaling pathways in subclones. We discovered distinct SV subclones with dysregulated Wnt signaling in a chronic lymphocytic leukemia patient. We further uncovered the consequences of subclonal chromothripsis in T cell acute lymphoblastic leukemia, which revealed c-Myb activation, enrichment of a primitive cell state and informed successful targeting of the subclone in cell culture, using a Notch inhibitor. By directly linking SVs to their functional effects, scNOVA enables systematic single-cell multiomic studies of structural variation in heterogeneous cell populations.


Subject(s)
Chromothripsis , Leukemia , Neoplasms , Humans , Neoplasms/genetics , Leukemia/genetics , Gene Rearrangement , Cell Line , Genomic Structural Variation
16.
Haematologica ; 108(2): 543-554, 2023 02 01.
Article in English | MEDLINE | ID: mdl-35522148

ABSTRACT

Histone methylation-modifiers, such as EZH2 and KMT2D, are recurrently altered in B-cell lymphomas. To comprehensively describe the landscape of alterations affecting genes encoding histone methylation-modifiers in lymphomagenesis we investigated whole genome and transcriptome data of 186 mature B-cell lymphomas sequenced in the ICGC MMML-Seq project. Besides confirming common alterations of KMT2D (47% of cases), EZH2 (17%), SETD1B (5%), PRDM9 (4%), KMT2C (4%), and SETD2 (4%), also identified by prior exome or RNA-sequencing studies, we here found recurrent alterations to KDM4C in chromosome 9p24, encoding a histone demethylase. Focal structural variation was the main mechanism of KDM4C alterations, and was independent from 9p24 amplification. We also identified KDM4C alterations in lymphoma cell lines including a focal homozygous deletion in a classical Hodgkin lymphoma cell line. By integrating RNA-sequencing and genome sequencing data we predict that KDM4C structural variants result in loss-offunction. By functional reconstitution studies in cell lines, we provide evidence that KDM4C can act as a tumor suppressor. Thus, we show that identification of structural variants in whole genome sequencing data adds to the comprehensive description of the mutational landscape of lymphomas and, moreover, establish KDM4C as a putative tumor suppressive gene recurrently altered in subsets of B-cell derived lymphomas.


Subject(s)
Lymphoma, B-Cell , Lymphoma , Humans , Histones/metabolism , Histone Demethylases/genetics , Homozygote , Sequence Deletion , Lymphoma/genetics , Lymphoma, B-Cell/genetics , Whole Genome Sequencing , RNA , Jumonji Domain-Containing Histone Demethylases/genetics , Jumonji Domain-Containing Histone Demethylases/chemistry , Jumonji Domain-Containing Histone Demethylases/metabolism , Histone-Lysine N-Methyltransferase/genetics
17.
Nature ; 611(7936): 519-531, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36261518

ABSTRACT

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Subject(s)
Chromosome Mapping , Diploidy , Genome, Human , Genomics , Humans , Chromosome Mapping/standards , Genome, Human/genetics , Haplotypes/genetics , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , Reference Standards , Genomics/methods , Genomics/standards , Chromosomes, Human/genetics , Genetic Variation/genetics
18.
Genome Res ; 32(10): 1941-1951, 2022 10.
Article in English | MEDLINE | ID: mdl-36180231

ABSTRACT

Gibbons are the most speciose family of living apes, characterized by a diverse chromosome number and rapid rate of large-scale rearrangements. Here we performed single-cell template strand sequencing (Strand-seq), molecular cytogenetics, and deep in silico analysis of a southern white-cheeked gibbon genome, providing the first comprehensive map of 238 previously hidden small-scale inversions. We determined that more than half are gibbon specific, at least fivefold higher than shown for other primate lineage-specific inversions, with a significantly high number of small heterozygous inversions, suggesting that accelerated evolution of inversions may have played a role in the high sympatric diversity of gibbons. Although the precise mechanisms underlying these inversions are not yet understood, it is clear that segmental duplication-mediated NAHR only accounts for a small fraction of events. Several genomic features, including gene density and repeat (e.g., LINE-1) content, might render these regions more break-prone and susceptible to inversion formation. In the attempt to characterize interspecific variation between southern and northern white-cheeked gibbons, we identify several large assembly errors in the current GGSC Nleu3.0/nomLeu3 reference genome comprising more than 49 megabases of DNA. Finally, we provide a list of 182 candidate genes potentially involved in gibbon diversification and speciation.


Subject(s)
Hominidae , Hylobates , Animals , Hylobates/genetics , Genome , Primates/genetics , Chromosome Inversion/genetics , Chromosomes , Hominidae/genetics
19.
Annu Rev Genomics Hum Genet ; 23: 123-152, 2022 08 31.
Article in English | MEDLINE | ID: mdl-35655332

ABSTRACT

Somatic rearrangements resulting in genomic structural variation drive malignant phenotypes by altering the expression or function of cancer genes. Pan-cancer studies have revealed that structural variants (SVs) are the predominant class of driver mutation in most cancer types, but because they are difficult to discover, they remain understudied when compared with point mutations. This review provides an overview of the current knowledge of somatic SVs, discussing their primary roles, prevalence in different contexts, and mutational mechanisms. SVs arise throughout the life history of cancer, and 55% of driver mutations uncovered by the Pan-Cancer Analysis of Whole Genomes project represent SVs. Leveraging the convergence of cell biology and genomics, we propose a mechanistic classification of somatic SVs, from simple to highly complex DNA rearrangement classes. The actions of DNA repair and DNA replication processes together with mitotic errors result in a rich spectrum of SV formation processes, with cascading effects mediating extensive structural diversity after an initiating DNA lesion has formed. Thanks to new sequencing technologies, including the sequencing of single-cell genomes, open questions about the molecular triggers and the biomolecules involved in SV formation as well as their mutational rates can now be addressed.


Subject(s)
Genomic Structural Variation , Neoplasms , Genome, Human , Genomics , Humans , Mutation , Neoplasms/epidemiology , Neoplasms/genetics , Neoplasms/pathology , Prevalence
20.
Cell ; 185(11): 1986-2005.e26, 2022 05 26.
Article in English | MEDLINE | ID: mdl-35525246

ABSTRACT

Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.


Subject(s)
Chromosome Inversion , Segmental Duplications, Genomic , Chromosome Inversion/genetics , DNA Copy Number Variations/genetics , Genome, Human , Genomics , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...