Search | VHL Regional Portal

The variation and evolution of complete human centromeres.

Logsdon, Glennis A; Rozanski, Allison N; Ryabov, Fedor; Potapova, Tamara; Shepelev, Valery A; Catacchio, Claudia R; Porubsky, David; Mao, Yafei; Yoo, DongAhn; Rautiainen, Mikko; Koren, Sergey; Nurk, Sergey; Lucas, Julian K; Hoekzema, Kendra; Munson, Katherine M; Gerton, Jennifer L; Phillippy, Adam M; Ventura, Mario; Alexandrov, Ivan A; Eichler, Evan E.

Nature ; 629(8010): 136-145, 2024 May.

Article in English | MEDLINE | ID: mdl-38570684

ABSTRACT

Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.

Subject(s)

Centromere , Evolution, Molecular , Genetic Variation , Animals , Humans , Centromere/genetics , Centromere/metabolism , Centromere Protein A/metabolism , DNA Methylation/genetics , DNA, Satellite/genetics , Kinetochores/metabolism , Macaca/genetics , Pan troglodytes/genetics , Polymorphism, Single Nucleotide/genetics , Pongo/genetics , Male , Female , Reference Standards , Chromatin Immunoprecipitation , Haplotypes , Mutation , Gene Amplification , Sequence Alignment , Chromatin/genetics , Chromatin/metabolism , Species Specificity

Phased nanopore assembly with Shasta and modular graph phasing with GFAse.

Lorig-Roach, Ryan; Meredith, Melissa; Monlong, Jean; Jain, Miten; Olsen, Hugh E; McNulty, Brandy; Porubsky, David; Montague, Tessa G; Lucas, Julian K; Condon, Chris; Eizenga, Jordan M; Juul, Sissel; McKenzie, Sean K; Simmonds, Sara E; Park, Jimin; Asri, Mobin; Koren, Sergey; Eichler, Evan E; Axel, Richard; Martin, Bruce; Carnevali, Paolo; Miga, Karen H; Paten, Benedict.

Genome Res ; 34(3): 454-468, 2024 Apr 25.

Article in English | MEDLINE | ID: mdl-38627094

ABSTRACT

Reference-free genome phasing is vital for understanding allele inheritance and the impact of single-molecule DNA variation on phenotypes. To achieve thorough phasing across homozygous or repetitive regions of the genome, long-read sequencing technologies are often used to perform phased de novo assembly. As a step toward reducing the cost and complexity of this type of analysis, we describe new methods for accurately phasing Oxford Nanopore Technologies (ONT) sequence data with the Shasta genome assembler and a modular tool for extending phasing to the chromosome scale called GFAse. We test using new variants of ONT PromethION sequencing, including those using proximity ligation, and show that newer, higher accuracy ONT reads substantially improve assembly quality.

Subject(s)

Nanopores , Humans , Sequence Analysis, DNA/methods , Nanopore Sequencing/methods , High-Throughput Nucleotide Sequencing/methods , Software , Genomics/methods

The complete sequence of a human Y chromosome.

Rhie, Arang; Nurk, Sergey; Cechova, Monika; Hoyt, Savannah J; Taylor, Dylan J; Altemose, Nicolas; Hook, Paul W; Koren, Sergey; Rautiainen, Mikko; Alexandrov, Ivan A; Allen, Jamie; Asri, Mobin; Bzikadze, Andrey V; Chen, Nae-Chyun; Chin, Chen-Shan; Diekhans, Mark; Flicek, Paul; Formenti, Giulio; Fungtammasan, Arkarachai; Garcia Giron, Carlos; Garrison, Erik; Gershman, Ariel; Gerton, Jennifer L; Grady, Patrick G S; Guarracino, Andrea; Haggerty, Leanne; Halabian, Reza; Hansen, Nancy F; Harris, Robert; Hartley, Gabrielle A; Harvey, William T; Haukness, Marina; Heinz, Jakob; Hourlier, Thibaut; Hubley, Robert M; Hunt, Sarah E; Hwang, Stephen; Jain, Miten; Kesharwani, Rupesh K; Lewis, Alexandra P; Li, Heng; Logsdon, Glennis A; Lucas, Julian K; Makalowski, Wojciech; Markovic, Christopher; Martin, Fergal J; Mc Cartney, Ann M; McCoy, Rajiv C; McDaniel, Jennifer; McNulty, Brandy M.

Nature ; 621(7978): 344-354, 2023 Sep.

Article in English | MEDLINE | ID: mdl-37612512

ABSTRACT

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.

Subject(s)

Chromosomes, Human, Y , Genomics , Sequence Analysis, DNA , Humans , Base Sequence , Chromosomes, Human, Y/genetics , DNA, Satellite/genetics , Genetic Variation/genetics , Genetics, Population , Genomics/methods , Genomics/standards , Heterochromatin/genetics , Multigene Family/genetics , Reference Standards , Segmental Duplications, Genomic/genetics , Sequence Analysis, DNA/standards , Tandem Repeat Sequences/genetics , Telomere/genetics

The variation and evolution of complete human centromeres.

Logsdon, Glennis A; Rozanski, Allison N; Ryabov, Fedor; Potapova, Tamara; Shepelev, Valery A; Mao, Yafei; Rautiainen, Mikko; Koren, Sergey; Nurk, Sergey; Porubsky, David; Lucas, Julian K; Hoekzema, Kendra; Munson, Katherine M; Gerton, Jennifer L; Phillippy, Adam M; Alexandrov, Ivan A; Eichler, Evan E.

bioRxiv ; 2023 May 30.

Article in English | MEDLINE | ID: mdl-37398417

ABSTRACT

We completely sequenced and assembled all centromeres from a second human genome and used two reference sets to benchmark genetic, epigenetic, and evolutionary variation within centromeres from a diversity panel of humans and apes. We find that centromere single-nucleotide variation can increase by up to 4.1-fold relative to other genomic regions, with the caveat that up to 45.8% of centromeric sequence, on average, cannot be reliably aligned with current methods due to the emergence of new α-satellite higher-order repeat (HOR) structures and two to threefold differences in the length of the centromeres. The extent to which this occurs differs depending on the chromosome and haplotype. Comparing the two sets of complete human centromeres, we find that eight harbor distinctly different α-satellite HOR array structures and four contain novel α-satellite HOR variants in high abundance. DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by at least 500 kbp-a property not readily associated with novel α-satellite HORs. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan, and macaque genomes. Comparative analyses reveal nearly complete turnover of α-satellite HORs, but with idiosyncratic changes in structure characteristic to each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the p- and q-arms of human chromosomes and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.

A draft human pangenome reference.

Liao, Wen-Wei; Asri, Mobin; Ebler, Jana; Doerr, Daniel; Haukness, Marina; Hickey, Glenn; Lu, Shuangjia; Lucas, Julian K; Monlong, Jean; Abel, Haley J; Buonaiuto, Silvia; Chang, Xian H; Cheng, Haoyu; Chu, Justin; Colonna, Vincenza; Eizenga, Jordan M; Feng, Xiaowen; Fischer, Christian; Fulton, Robert S; Garg, Shilpa; Groza, Cristian; Guarracino, Andrea; Harvey, William T; Heumos, Simon; Howe, Kerstin; Jain, Miten; Lu, Tsung-Yu; Markello, Charles; Martin, Fergal J; Mitchell, Matthew W; Munson, Katherine M; Mwaniki, Moses Njagi; Novak, Adam M; Olsen, Hugh E; Pesout, Trevor; Porubsky, David; Prins, Pjotr; Sibbesen, Jonas A; Sirén, Jouni; Tomlinson, Chad; Villani, Flavia; Vollger, Mitchell R; Antonacci-Fulton, Lucinda L; Baid, Gunjan; Baker, Carl A; Belyaeva, Anastasiya; Billis, Konstantinos; Carroll, Andrew; Chang, Pi-Chuan; Cody, Sarah.

Nature ; 617(7960): 312-324, 2023 05.

Article in English | MEDLINE | ID: mdl-37165242

ABSTRACT

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.

Subject(s)

Genome, Human , Genomics , Humans , Diploidy , Genome, Human/genetics , Haplotypes/genetics , Sequence Analysis, DNA , Genomics/standards , Reference Standards , Cohort Studies , Alleles , Genetic Variation

The Human Pangenome Project: a global resource to map genomic diversity.

Wang, Ting; Antonacci-Fulton, Lucinda; Howe, Kerstin; Lawson, Heather A; Lucas, Julian K; Phillippy, Adam M; Popejoy, Alice B; Asri, Mobin; Carson, Caryn; Chaisson, Mark J P; Chang, Xian; Cook-Deegan, Robert; Felsenfeld, Adam L; Fulton, Robert S; Garrison, Erik P; Garrison, Nanibaa' A; Graves-Lindsay, Tina A; Ji, Hanlee; Kenny, Eimear E; Koenig, Barbara A; Li, Daofeng; Marschall, Tobias; McMichael, Joshua F; Novak, Adam M; Purushotham, Deepak; Schneider, Valerie A; Schultz, Baergen I; Smith, Michael W; Sofia, Heidi J; Weissman, Tsachy; Flicek, Paul; Li, Heng; Miga, Karen H; Paten, Benedict; Jarvis, Erich D; Hall, Ira M; Eichler, Evan E; Haussler, David.

Nature ; 604(7906): 437-446, 2022 04.

Article in English | MEDLINE | ID: mdl-35444317

ABSTRACT

The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.

Subject(s)

Genome, Human , Genomics , Genome, Human/genetics , Haplotypes/genetics , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA

Complete genomic and epigenetic maps of human centromeres.

Altemose, Nicolas; Logsdon, Glennis A; Bzikadze, Andrey V; Sidhwani, Pragya; Langley, Sasha A; Caldas, Gina V; Hoyt, Savannah J; Uralsky, Lev; Ryabov, Fedor D; Shew, Colin J; Sauria, Michael E G; Borchers, Matthew; Gershman, Ariel; Mikheenko, Alla; Shepelev, Valery A; Dvorkina, Tatiana; Kunyavskaya, Olga; Vollger, Mitchell R; Rhie, Arang; McCartney, Ann M; Asri, Mobin; Lorig-Roach, Ryan; Shafin, Kishwar; Lucas, Julian K; Aganezov, Sergey; Olson, Daniel; de Lima, Leonardo Gomes; Potapova, Tamara; Hartley, Gabrielle A; Haukness, Marina; Kerpedjiev, Peter; Gusev, Fedor; Tigyi, Kristof; Brooks, Shelise; Young, Alice; Nurk, Sergey; Koren, Sergey; Salama, Sofie R; Paten, Benedict; Rogaev, Evgeny I; Streets, Aaron; Karpen, Gary H; Dernburg, Abby F; Sullivan, Beth A; Straight, Aaron F; Wheeler, Travis J; Gerton, Jennifer L; Eichler, Evan E; Phillippy, Adam M; Timp, Winston.

Science ; 376(6588): eabl4178, 2022 04.

Article in English | MEDLINE | ID: mdl-35357911

ABSTRACT

Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.

Subject(s)

Centromere/genetics , Chromosome Mapping , Epigenesis, Genetic , Genome, Human , Evolution, Molecular , Genomics , Humans , Repetitive Sequences, Nucleic Acid

Transcription initiation by human RNA polymerase II visualized at single-molecule resolution.

Revyakin, Andrey; Zhang, Zhengjian; Coleman, Robert A; Li, Yan; Inouye, Carla; Lucas, Julian K; Park, Sang-Ryul; Chu, Steven; Tjian, Robert.

Genes Dev ; 26(15): 1691-702, 2012 Aug 01.

Article in English | MEDLINE | ID: mdl-22810624

ABSTRACT

Forty years of classical biochemical analysis have identified the molecular players involved in initiation of transcription by eukaryotic RNA polymerase II (Pol II) and largely assigned their functions. However, a dynamic picture of Pol II transcription initiation and an understanding of the mechanisms of its regulation have remained elusive due in part to inherent limitations of conventional ensemble biochemistry. Here we have begun to dissect promoter-specific transcription initiation directed by a reconstituted human Pol II system at single-molecule resolution using fluorescence video-microscopy. We detected several stochastic rounds of human Pol II transcription from individual DNA templates, observed attenuation of transcription by promoter mutations, observed enhancement of transcription by activator Sp1, and correlated the transcription signals with real-time interactions of holo-TFIID molecules at individual DNA templates. This integrated single-molecule methodology should be applicable to studying other complex biological processes.

Subject(s)

Molecular Imaging/methods , RNA Polymerase II/chemistry , Transcription, Genetic , Humans , Microscopy, Fluorescence/methods , Microscopy, Video/methods , Mutation , Promoter Regions, Genetic , RNA Polymerase II/genetics , RNA Polymerase II/metabolism , Sp1 Transcription Factor/chemistry , Sp1 Transcription Factor/metabolism , Transcription Factor TFIID/chemistry , Transcription Factor TFIID/metabolism

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL