Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 672
Filter
1.
Methods Mol Biol ; 2802: 587-609, 2024.
Article in English | MEDLINE | ID: mdl-38819573

ABSTRACT

Comparative analysis of (meta)genomes necessitates aggregation, integration, and synthesis of well-annotated data using standards. The Genomic Standards Consortium (GSC) collaborates with the research community to develop and maintain the Minimum Information about any (x) Sequence (MIxS) reporting standard for genomic data. To facilitate the use of the GSC's MIxS reporting standard, we provide a description of the structure and terminology, how to navigate ontologies for required terms in MIxS, and demonstrate practical usage through a soil metagenome example.


Subject(s)
Genomics , Metagenome , Metagenomics , Metagenomics/methods , Metagenomics/standards , Genomics/methods , Genomics/standards , Metagenome/genetics , Databases, Genetic , Soil Microbiology
3.
Eur J Hum Genet ; 32(5): 521-528, 2024 May.
Article in English | MEDLINE | ID: mdl-38212661

ABSTRACT

Automating reanalysis of genomic data for undiagnosed rare disease patients presents a paradigm shift in how clinical genomics is delivered. We aimed to map the current manual and proposed automated approach to reanalysis and identify possible implementation strategies to address clinical and laboratory staff's perceived challenges to automation. Fourteen semi-structured interviews guided by a simplified process map were conducted with clinical and laboratory staff across Australia. Individual process maps were integrated into an overview of the current process, noting variation in service delivery. Participants then mapped an automated approach and were invited to discuss perceived challenges and possible supports to automation. Responses were analysed using the Consolidated Framework for Implementation Research, linking to the Expert Recommendations for Implementing Change framework to identify theory-informed implementation strategies. Process mapping demonstrates how automation streamlines processes with eleven steps reduced to seven. Although participants welcomed automation, challenges were raised at six of the steps. Strategies to overcome challenges include embedding project champions, developing education materials, facilitating clinical innovation and quality monitoring tools, and altering reimbursement structures. Future work can build on these findings to develop context specific implementation strategies to guide translation of an automated approach to reanalysis to improve clinical care and patient outcomes.


Subject(s)
Genomics , Humans , Genomics/methods , Genomics/standards , Qualitative Research , Genetic Testing/standards , Genetic Testing/methods , Australia , Automation
4.
Nature ; 621(7978): 344-354, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37612512

ABSTRACT

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.


Subject(s)
Chromosomes, Human, Y , Genomics , Sequence Analysis, DNA , Humans , Base Sequence , Chromosomes, Human, Y/genetics , DNA, Satellite/genetics , Genetic Variation/genetics , Genetics, Population , Genomics/methods , Genomics/standards , Heterochromatin/genetics , Multigene Family/genetics , Reference Standards , Segmental Duplications, Genomic/genetics , Sequence Analysis, DNA/standards , Tandem Repeat Sequences/genetics , Telomere/genetics
6.
JAMA ; 330(3): 205-206, 2023 07 18.
Article in English | MEDLINE | ID: mdl-37379037

ABSTRACT

This Medical News article discusses the Human Pangenome Project.


Subject(s)
Genome, Human , Genomics , Medicine , Humans , Genome, Human/genetics , Genomics/standards , Medicine/trends
7.
Nature ; 617(7960): 312-324, 2023 05.
Article in English | MEDLINE | ID: mdl-37165242

ABSTRACT

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.


Subject(s)
Genome, Human , Genomics , Humans , Diploidy , Genome, Human/genetics , Haplotypes/genetics , Sequence Analysis, DNA , Genomics/standards , Reference Standards , Cohort Studies , Alleles , Genetic Variation
9.
BMC Genomics ; 24(1): 117, 2023 Mar 16.
Article in English | MEDLINE | ID: mdl-36927511

ABSTRACT

BACKGROUND: Generating the most contiguous, accurate genome assemblies given available sequencing technologies is a long-standing challenge in genome science. With the rise of long-read sequencing, assembly challenges have shifted from merely increasing contiguity to correctly assembling complex, repetitive regions of interest, ideally in a phased manner. At present, researchers largely choose between two types of long read data: longer, but less accurate sequences, or highly accurate, but shorter reads (i.e., >Q20 or 99% accurate). To better understand how these types of long-read data as well as scale of data (i.e., mean length and sequencing depth) influence genome assembly outcomes, we compared genome assemblies for a caddisfly, Hesperophylax magnus, generated with longer, but less accurate, Oxford Nanopore (ONT) R9.4.1 and highly accurate PacBio HiFi (HiFi) data. Next, we expanded this comparison to consider the influence of highly accurate long-read sequence data on genome assemblies across 6750 plant and animal genomes. For this broader comparison, we used HiFi data as a surrogate for highly accurate long-reads broadly as we could identify when they were used from GenBank metadata. RESULTS: HiFi reads outperformed ONT reads in all assembly metrics tested for the caddisfly data set and allowed for accurate assembly of the repetitive ~ 20 Kb H-fibroin gene. Across plants and animals, genome assemblies that incorporated HiFi reads were also more contiguous. For plants, the average HiFi assembly was 501% more contiguous (mean contig N50 = 20.5 Mb) than those generated with any other long-read data (mean contig N50 = 4.1 Mb). For animals, HiFi assemblies were 226% more contiguous (mean contig N50 = 20.9 Mb) versus other long-read assemblies (mean contig N50 = 9.3 Mb). In plants, we also found limited evidence that HiFi may offer a unique solution for overcoming genomic complexity that scales with assembly size. CONCLUSIONS: Highly accurate long-reads generated with HiFi or analogous technologies represent a key tool for maximizing genome assembly quality for a wide swath of plants and animals. This finding is particularly important when resources only allow for one type of sequencing data to be generated. Ultimately, to realize the promise of biodiversity genomics, we call for greater uptake of highly accurate long-reads in future studies.


Subject(s)
Biodiversity , Genomics , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Genomics/methods , Genomics/standards , Genomics/trends , Insecta/classification , Insecta/genetics , Fibroins/genetics , Contig Mapping , Genome, Insect/genetics , Animals , Databases, Nucleic Acid , Reproducibility of Results , Meta-Analysis as Topic , Datasets as Topic , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , High-Throughput Nucleotide Sequencing/trends , Plants/genetics , Genome, Plant/genetics
10.
Trends Genet ; 39(3): 175-186, 2023 03.
Article in English | MEDLINE | ID: mdl-36402623

ABSTRACT

Quality control is essential for genome assemblies; however, a consensus has yet to be reached on what metrics should be adopted for the evaluation of assembly quality. N50 is widely used for contiguity measurement, but its effectiveness is constantly in question. Prevailing metrics for the completeness evaluation focus on gene space, yet challenging areas such as tandem repeats are commonly overlooked. Achieving correctness has become an indispensable dimension for quality control, while prevailing assembly releases lack scores reflecting this aspect. We propose a metric set with a set of statistic indexes for effective, comprehensive evaluation of assemblies and provide a score of a finished assembly for each metric, which can be utilized as a benchmark for achieving high-quality genome assemblies.


Subject(s)
Genomics , Sequence Analysis, DNA , Sequence Analysis, DNA/methods , Genomics/standards
11.
Nature ; 611(7936): 519-531, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36261518

ABSTRACT

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Subject(s)
Chromosome Mapping , Diploidy , Genome, Human , Genomics , Humans , Chromosome Mapping/standards , Genome, Human/genetics , Haplotypes/genetics , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , Reference Standards , Genomics/methods , Genomics/standards , Chromosomes, Human/genetics , Genetic Variation/genetics
12.
13.
Science ; 376(6588): eabj5089, 2022 04.
Article in English | MEDLINE | ID: mdl-35357915

ABSTRACT

The completion of a telomere-to-telomere human reference genome, T2T-CHM13, has resolved complex regions of the genome, including repetitive and homologous regions. Here, we present a high-resolution epigenetic study of previously unresolved sequences, representing entire acrocentric chromosome short arms, gene family expansions, and a diverse collection of repeat classes. This resource precisely maps CpG methylation (32.28 million CpGs), DNA accessibility, and short-read datasets (166,058 previously unresolved chromatin immunoprecipitation sequencing peaks) to provide evidence of activity across previously unidentified or corrected genes and reveals clinically relevant paralog-specific regulation. Probing CpG methylation across human centromeres from six diverse individuals generated an estimate of variability in kinetochore localization. This analysis provides a framework with which to investigate the most elusive regions of the human genome, granting insights into epigenetic regulation.


Subject(s)
CpG Islands , DNA Methylation , Epigenesis, Genetic , Genome, Human , Centromere/genetics , Centromere/metabolism , Disease/genetics , Genetic Loci , Genomics/standards , Humans , Reference Standards , Sequence Analysis, DNA
14.
Science ; 376(6588): eabl3533, 2022 04.
Article in English | MEDLINE | ID: mdl-35357935

ABSTRACT

Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics.


Subject(s)
Genetic Variation , Genome, Human , Genomics/standards , Sequence Analysis, DNA/standards , Humans , Reference Standards
15.
Cancer Cell ; 40(2): 109-113, 2022 02 14.
Article in English | MEDLINE | ID: mdl-35120599

ABSTRACT

Cancers other than breast, colorectal, cervical, and lung do not have guideline-recommended screening. New multi-cancer early detection (MCED) tests-using a single blood sample-have been developed based on circulating cell-free DNA (cfDNA) or other analytes. In this commentary, we review the current evidence on these tests, provide several major considerations for new MCED tests, and outline how their evaluation will need to differ from that established for traditional single-cancer screening tests.


Subject(s)
Biomarkers, Tumor , Early Detection of Cancer , Genomics/methods , Neoplasms/diagnosis , Neoplasms/genetics , Clinical Decision-Making , Disease Management , Disease Susceptibility , Early Detection of Cancer/methods , Early Detection of Cancer/standards , Genomics/standards , Humans , Organ Specificity
16.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Article in English | MEDLINE | ID: mdl-35042802

ABSTRACT

A global international initiative, such as the Earth BioGenome Project (EBP), requires both agreement and coordination on standards to ensure that the collective effort generates rapid progress toward its goals. To this end, the EBP initiated five technical standards committees comprising volunteer members from the global genomics scientific community: Sample Collection and Processing, Sequencing and Assembly, Annotation, Analysis, and IT and Informatics. The current versions of the resulting standards documents are available on the EBP website, with the recognition that opportunities, technologies, and challenges may improve or change in the future, requiring flexibility for the EBP to meet its goals. Here, we describe some highlights from the proposed standards, and areas where additional challenges will need to be met.


Subject(s)
Base Sequence/genetics , Eukaryota/genetics , Genomics/standards , Animals , Biodiversity , Genomics/methods , Humans , Reference Standards , Reference Values , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards
17.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Article in English | MEDLINE | ID: mdl-35042806

ABSTRACT

Globally, 15,521 animal species are listed as threatened by the International Union for the Conservation of Nature, and of these less than 3% have genomic resources that can inform conservation management. To combat this, global genome initiatives are developing genomic resources, yet production of a reference genome alone does not conserve a species. The reference genome allows us to develop a suite of tools to understand both genome-wide and functional diversity within and between species. Conservation practitioners can use these tools to inform their decision-making. But, at present there is an implementation gap between the release of genome information and the use of genomic data in applied conservation by conservation practitioners. In May 2020, we launched the Threatened Species Initiative and brought a consortium of genome biologists, population biologists, bioinformaticians, population geneticists, and ecologists together with conservation agencies across Australia, including government, zoos, and nongovernment organizations. Our objective is to create a foundation of genomic data to advance our understanding of key Australian threatened species, and ultimately empower conservation practitioners to access and apply genomic data to their decision-making processes through a web-based portal. Currently, we are developing genomic resources for 61 threatened species from a range of taxa, across Australia, with more than 130 collaborators from government, academia, and conservation organizations. Developed in direct consultation with government threatened-species managers and other conservation practitioners, herein we present our framework for meeting their needs and our systematic approach to integrating genomics into threatened species recovery.


Subject(s)
Conservation of Natural Resources/methods , Endangered Species/legislation & jurisprudence , Genomics/standards , Animals , Data Collection , Endangered Species/trends , Genome , Genomics/legislation & jurisprudence , Genomics/methods , Government
18.
Nucleic Acids Res ; 50(D1): D1468-D1474, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34747486

ABSTRACT

PLAZA is a platform for comparative, evolutionary, and functional plant genomics. It makes a broad set of genomes, data types and analysis tools available to researchers through a user-friendly website, an API, and bulk downloads. In this latest release of the PLAZA platform, we are integrating a record number of 134 high-quality plant genomes, split up over two instances: PLAZA Dicots 5.0 and PLAZA Monocots 5.0. This number of genomes corresponds with a massive expansion in the number of available species when compared to PLAZA 4.0, which offered access to 71 species, a 89% overall increase. The PLAZA 5.0 release contains information for 5 882 730 genes, and offers pre-computed gene families and phylogenetic trees for 5 274 684 protein-coding genes. This latest release also comes with a set of new and updated features: a new BED import functionality for the workbench, improved interactive visualizations for functional enrichments and genome-wide mapping of gene sets, and a fully redesigned and extended API. Taken together, this new version offers extended support for plant biologists working on different families within the green plant lineage and provides an efficient and versatile toolbox for plant genomics. All PLAZA releases are accessible from the portal website: https://bioinformatics.psb.ugent.be/plaza/.


Subject(s)
Biological Evolution , Databases, Genetic , Plants/classification , Software , Genome, Plant/genetics , Genomics/standards , Molecular Sequence Annotation , Multigene Family/genetics , Phylogeny , Plants/genetics
19.
Mol Genet Genomics ; 297(1): 33-46, 2022 Jan.
Article in English | MEDLINE | ID: mdl-34755217

ABSTRACT

Based on molecular markers, genomic prediction enables us to speed up breeding schemes and increase the response to selection. There are several high-throughput genotyping platforms able to deliver thousands of molecular markers for genomic study purposes. However, even though its widely applied in plant breeding, species without a reference genome cannot fully benefit from genomic tools and modern breeding schemes. We used a method to assemble a population-tailored mock genome to call single-nucleotide polymorphism (SNP) markers without an available reference genome, and for the first time, we compared the results with standard genotyping platforms (array and genotyping-by-sequencing (GBS) using a reference genome) for performance in genomic prediction models. Our results indicate that using a population-tailored mock genome to call SNP delivers reliable estimates for the genomic relationship between genotypes. Furthermore, genomic prediction estimates were comparable to standard approaches, especially when considering only additive effects. However, mock genomes were slightly worse than arrays at predicting traits influenced by dominance effects, but still performed as well as standard GBS methods that use a reference genome. Nevertheless, the array-based SNP markers methods achieved the best predictive ability and reliability to estimate variance components. Overall, the mock genomes can be a worthy alternative for genomic selection studies, especially for those species where the reference genome is not available.


Subject(s)
Computational Biology , Genotyping Techniques , Models, Genetic , Animals , Chimera/genetics , Computational Biology/methods , Computational Biology/standards , Datasets as Topic , Genome , Genome-Wide Association Study/methods , Genome-Wide Association Study/standards , Genomics/methods , Genomics/standards , Genotype , Genotyping Techniques/methods , Genotyping Techniques/standards , Phenotype , Reference Standards , Reproducibility of Results , Selection, Genetic , Species Specificity , Zea mays/classification , Zea mays/genetics
20.
Nat Rev Genet ; 23(3): 169-181, 2022 03.
Article in English | MEDLINE | ID: mdl-34837041

ABSTRACT

The scale of genetic, epigenomic, transcriptomic, cheminformatic and proteomic data available today, coupled with easy-to-use machine learning (ML) toolkits, has propelled the application of supervised learning in genomics research. However, the assumptions behind the statistical models and performance evaluations in ML software frequently are not met in biological systems. In this Review, we illustrate the impact of several common pitfalls encountered when applying supervised ML in genomics. We explore how the structure of genomics data can bias performance evaluations and predictions. To address the challenges associated with applying cutting-edge ML methods to genomics, we describe solutions and appropriate use cases where ML modelling shows great potential.


Subject(s)
Genomics/methods , Machine Learning , Animals , Genomics/standards , Genomics/trends , Humans , Machine Learning/standards , Models, Statistical , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...