Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Nature ; 617(7960): 312-324, 2023 05.
Article in English | MEDLINE | ID: mdl-37165242

ABSTRACT

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.


Subject(s)
Genome, Human , Genomics , Humans , Diploidy , Genome, Human/genetics , Haplotypes/genetics , Sequence Analysis, DNA , Genomics/standards , Reference Standards , Cohort Studies , Alleles , Genetic Variation
2.
Genome Res ; 33(4): 511-524, 2023 04.
Article in English | MEDLINE | ID: mdl-37037626

ABSTRACT

Understanding the impact of DNA variation on human traits is a fundamental question in human genetics. Variable number tandem repeats (VNTRs) make up ∼3% of the human genome but are often excluded from association analysis owing to poor read mappability or divergent repeat content. Although methods exist to estimate VNTR length from short-read data, it is known that VNTRs vary in both length and repeat (motif) composition. Here, we use a repeat-pangenome graph (RPGG) constructed on 35 haplotype-resolved assemblies to detect variation in both VNTR length and repeat composition. We align population-scale data from the Genotype-Tissue Expression (GTEx) Consortium to examine how variations in sequence composition may be linked to expression, including cases independent of overall VNTR length. We find that 9422 out of 39,125 VNTRs are associated with nearby gene expression through motif variations, of which only 23.4% are accessible from length. Fine-mapping identifies 174 genes to be likely driven by variation in certain VNTR motifs and not overall length. We highlight two genes, CACNA1C and RNF213, that have expression associated with motif variation, showing the utility of RPGG analysis as a new approach for trait association in multiallelic and highly variable loci.


Subject(s)
Adenosine Triphosphatases , Minisatellite Repeats , Humans , Minisatellite Repeats/genetics , Phenotype , Haplotypes , Gene Expression , Adenosine Triphosphatases/genetics , Ubiquitin-Protein Ligases/genetics
3.
F1000Res ; 10: 246, 2021.
Article in English | MEDLINE | ID: mdl-34621504

ABSTRACT

In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research.   The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at  https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.


Subject(s)
COVID-19 , SARS-CoV-2 , Animals , Genome, Viral , Humans , Vertebrates
4.
Nat Commun ; 12(1): 4250, 2021 07 12.
Article in English | MEDLINE | ID: mdl-34253730

ABSTRACT

Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.


Subject(s)
Genetic Variation , Genetics, Population , Genome, Human , Minisatellite Repeats/genetics , Chromosome Mapping , Gene Expression Regulation , Genetic Loci , Humans , Nucleotide Motifs/genetics , Quantitative Trait Loci/genetics
5.
Science ; 372(6537)2021 04 02.
Article in English | MEDLINE | ID: mdl-33632895

ABSTRACT

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.


Subject(s)
Genetic Variation , Genome, Human , Haplotypes , Female , Genotype , High-Throughput Nucleotide Sequencing , Humans , INDEL Mutation , Interspersed Repetitive Sequences , Male , Population Groups/genetics , Quantitative Trait Loci , Retroelements , Sequence Analysis, DNA , Sequence Inversion , Whole Genome Sequencing
6.
IEEE J Biomed Health Inform ; 20(4): 1178-87, 2016 07.
Article in English | MEDLINE | ID: mdl-26087507

ABSTRACT

The objective of prehospital emergency medical services (EMSs) is to have a short response time. By increasing the operational efficiency, the survival rate of patients could potentially be increased. The geographic information system (GIS) is introduced in this study to manage and visualize the spatial distribution of demand data and forecasting results. A flexible model is implemented in GIS, through which training data are prepared with user-desired sizes for the spatial grid and discretized temporal steps. We applied moving average, artificial neural network, sinusoidal regression, and support vector regression for the forecasting of prehospital emergency medical demand. The results from these approaches, as a reference, could be used for the preallocation of ambulances. A case study is conducted for the EMS in New Taipei City, where prehospital EMS data have been collected for three years. The model selection process has chosen different models with different input features for the forecast of different areas. The best daily mean absolute percentage error during testing of the EMS demand forecast is 23.01%, which is a reasonable forecast based on Lewis' definition. With the acceptable prediction performance, the proposed approach has its potential to be applied to the current practice.


Subject(s)
Ambulances , Computational Biology/methods , Emergency Medical Services/methods , Neural Networks, Computer , Ambulances/statistics & numerical data , Forecasting , Geographic Information Systems , Humans , Support Vector Machine
SELECTION OF CITATIONS
SEARCH DETAIL
...