Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
medRxiv ; 2024 May 21.
Article in English | MEDLINE | ID: mdl-38826469

ABSTRACT

Approximately 3% of the human genome consists of repetitive elements called tandem repeats (TRs), which include short tandem repeats (STRs) of 1-6bp motifs and variable number tandem repeats (VNTRs) of 7+bp motifs. TR variants contribute to several dozen mono- and polygenic diseases but remain understudied and "enigmatic," particularly relative to single nucleotide variants. It remains comparatively challenging to interpret the clinical significance of TR variants. Although existing resources provide portions of necessary data for interpretation at disease-associated loci, it is currently difficult or impossible to efficiently invoke the additional details critical to proper interpretation, such as motif pathogenicity, disease penetrance, and age of onset distributions. It is also often unclear how to apply population information to analyses. We present STRchive (S-T-archive, http://strchive.org/ ), a dynamic resource consolidating information on TR disease loci in humans from research literature, up-to-date clinical resources, and large-scale genomic databases, with the goal of streamlining TR variant interpretation at disease-associated loci. We apply STRchive -including pathogenic thresholds, motif classification, and clinical phenotypes-to a gnomAD cohort of ∼18.5k individuals genotyped at 60 disease-associated loci. Through detailed literature curation, we demonstrate that the majority of TR diseases affect children despite being thought of as adult diseases. Additionally, we show that pathogenic genotypes can be found within gnomAD which do not necessarily overlap with known disease prevalence, and leverage STRchive to interpret locus-specific findings therein. We apply a diagnostic blueprint empowered by STRchive to relevant clinical vignettes, highlighting possible pitfalls in TR variant interpretation. As a living resource, STRchive is maintained by experts, takes community contributions, and will evolve as understanding of TR diseases progresses.

2.
medRxiv ; 2024 Mar 07.
Article in English | MEDLINE | ID: mdl-38496498

ABSTRACT

Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

3.
Nat Rev Genet ; 2024 Feb 16.
Article in English | MEDLINE | ID: mdl-38366034

ABSTRACT

Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.

4.
Nat Biotechnol ; 2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38168995

ABSTRACT

Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.

5.
Genome Biol ; 23(1): 257, 2022 12 14.
Article in English | MEDLINE | ID: mdl-36517892

ABSTRACT

Expansions of short tandem repeats (STRs) cause many rare diseases. Expansion detection is challenging with short-read DNA sequencing data since supporting reads are often mapped incorrectly. Detection is particularly difficult for "novel" STRs, which include new motifs at known loci or STRs absent from the reference genome. We developed STRling to efficiently count k-mers to recover informative reads and call expansions at known and novel STR loci. STRling is sensitive to known STR disease loci, has a low false discovery rate, and resolves novel STR expansions to base-pair position accuracy. It is fast, scalable, open-source, and available at: github.com/quinlan-lab/STRling .


Subject(s)
High-Throughput Nucleotide Sequencing , Microsatellite Repeats , Sequence Analysis, DNA
7.
NPJ Genom Med ; 6(1): 60, 2021 Jul 15.
Article in English | MEDLINE | ID: mdl-34267211

ABSTRACT

In studies of families with rare disease, it is common to screen for de novo mutations, as well as recessive or dominant variants that explain the phenotype. However, the filtering strategies and software used to prioritize high-confidence variants vary from study to study. In an effort to establish recommendations for rare disease research, we explore effective guidelines for variant (SNP and INDEL) filtering and report the expected number of candidates for de novo dominant, recessive, and autosomal dominant modes of inheritance. We derived these guidelines using two large family-based cohorts that underwent whole-genome sequencing, as well as two family cohorts with whole-exome sequencing. The filters are applied to common attributes, including genotype-quality, sequencing depth, allele balance, and population allele frequency. The resulting guidelines yield ~10 candidate SNP and INDEL variants per exome, and 18 per genome for recessive and de novo dominant modes of inheritance, with substantially more candidates for autosomal dominant inheritance. For family-based, whole-genome sequencing studies, this number includes an average of three de novo, ten compound heterozygous, one autosomal recessive, four X-linked variants, and roughly 100 candidate variants following autosomal dominant inheritance. The slivar software we developed to establish and rapidly apply these filters to VCF files is available at https://github.com/brentp/slivar under an MIT license, and includes documentation and recommendations for best practices for rare disease analysis.

8.
Gigascience ; 8(9)2019 09 01.
Article in English | MEDLINE | ID: mdl-31544213

ABSTRACT

BACKGROUND: Bioinformatics software tools are often created ad hoc, frequently by people without extensive training in software development. In particular, for beginners, the barrier to entry in bioinformatics software development is high, especially if they want to adopt good programming practices. Even experienced developers do not always follow best practices. This results in the proliferation of poorer-quality bioinformatics software, leading to limited scalability and inefficient use of resources; lack of reproducibility, usability, adaptability, and interoperability; and erroneous or inaccurate results. FINDINGS: We have developed Bionitio, a tool that automates the process of starting new bioinformatics software projects following recommended best practices. With a single command, the user can create a new well-structured project in 1 of 12 programming languages. The resulting software is functional, carrying out a prototypical bioinformatics task, and thus serves as both a working example and a template for building new tools. Key features include command-line argument parsing, error handling, progress logging, defined exit status values, a test suite, a version number, standardized building and packaging, user documentation, code documentation, a standard open source software license, software revision control, and containerization. CONCLUSIONS: Bionitio serves as a learning aid for beginner-to-intermediate bioinformatics programmers and provides an excellent starting point for new projects. This helps developers adopt good programming practices from the beginning of a project and encourages high-quality tools to be developed more rapidly. This also benefits users because tools are more easily installed and consistent in their usage. Bionitio is released as open source software under the MIT License and is available at https://github.com/bionitio-team/bionitio.


Subject(s)
Computational Biology , Software
9.
Genome Biol ; 19(1): 121, 2018 08 21.
Article in English | MEDLINE | ID: mdl-30129428

ABSTRACT

Short tandem repeat (STR) expansions have been identified as the causal DNA mutation in dozens of Mendelian diseases. Most existing tools for detecting STR variation with short reads do so within the read length and so are unable to detect the majority of pathogenic expansions. Here we present STRetch, a new genome-wide method to scan for STR expansions at all loci across the human genome. We demonstrate the use of STRetch for detecting STR expansions using short-read whole-genome sequencing data at known pathogenic loci as well as novel STR loci. STRetch is open source software, available from github.com/Oshlack/STRetch .


Subject(s)
DNA Repeat Expansion/genetics , Microsatellite Repeats/genetics , Software , Alleles , Chromosomes, Human/genetics , Genetic Loci , Genome, Human , Humans , Polymerase Chain Reaction
10.
Eur J Hum Genet ; 25(11): 1268-1272, 2017 11.
Article in English | MEDLINE | ID: mdl-28832562

ABSTRACT

Rapid identification of clinically significant variants is key to the successful application of next generation sequencing technologies in clinical practice. The Melbourne Genomics Health Alliance (MGHA) variant prioritization framework employs a gene prioritization index based on clinician-generated a priori gene lists, and a variant prioritization index (VPI) based on rarity, conservation and protein effect. We used data from 80 patients who underwent singleton whole exome sequencing (WES) to test the ability of the framework to rank causative variants highly, and compared it against the performance of other gene and variant prioritization tools. Causative variants were identified in 59 of the patients. Using the MGHA prioritization framework the average rank of the causative variant was 2.24, with 76% ranked as the top priority variant, and 90% ranked within the top five. Using clinician-generated gene lists resulted in ranking causative variants an average of 8.2 positions higher than prioritization based on variant properties alone. This clinically driven prioritization approach significantly outperformed purely computational tools, placing a greater proportion of causative variants top or in the top 5 (permutation P-value=0.001). Clinicians included 40 of the 49 WES diagnoses in their a priori list of differential diagnoses (81%). The lists generated by PhenoTips and Phenomizer contained 14 (29%) and 18 (37%) of these diagnoses respectively. These results highlight the benefits of clinically led variant prioritization in increasing the efficiency of singleton WES data analysis and have important implications for developing models for the funding and delivery of genomic services.


Subject(s)
Abnormalities, Multiple/genetics , Exome , Genetic Testing/methods , Polymorphism, Genetic , Sequence Analysis, DNA/methods , Abnormalities, Multiple/diagnosis , Female , Genetic Loci , Humans , Infant , Male , Mutation
12.
Genome Med ; 7(1): 68, 2015.
Article in English | MEDLINE | ID: mdl-26217397

ABSTRACT

The benefits of implementing high throughput sequencing in the clinic are quickly becoming apparent. However, few freely available bioinformatics pipelines have been built from the ground up with clinical genomics in mind. Here we present Cpipe, a pipeline designed specifically for clinical genetic disease diagnostics. Cpipe was developed by the Melbourne Genomics Health Alliance, an Australian initiative to promote common approaches to genomics across healthcare institutions. As such, Cpipe has been designed to provide fast, effective and reproducible analysis, while also being highly flexible and customisable to meet the individual needs of diverse clinical settings. Cpipe is being shared with the clinical sequencing community as an open source project and is available at http://cpipeline.org.

13.
Genome Med ; 6(11): 90, 2014.
Article in English | MEDLINE | ID: mdl-25422674

ABSTRACT

Rapid molecular typing of bacterial pathogens is critical for public health epidemiology, surveillance and infection control, yet routine use of whole genome sequencing (WGS) for these purposes poses significant challenges. Here we present SRST2, a read mapping-based tool for fast and accurate detection of genes, alleles and multi-locus sequence types (MLST) from WGS data. Using >900 genomes from common pathogens, we show SRST2 is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment. We include validation of SRST2 within a public health laboratory, and demonstrate its use for microbial genome surveillance in the hospital setting. In the face of rising threats of antimicrobial resistance and emerging virulence among bacterial pathogens, SRST2 represents a powerful tool for rapidly extracting clinically useful information from raw WGS data. Source code is available from http://katholt.github.io/srst2/.

15.
PLoS One ; 7(9): e44974, 2012.
Article in English | MEDLINE | ID: mdl-23024777

ABSTRACT

The mutation R403stop was found in an individual with mut(0) methylmalonic aciduria (MMA) which resulted from a single base change of C→T in exon 6 of the methylmalonyl-CoA mutase gene (producing a TGA stop codon). In order to accurately model the human MMA disorder we introduced this mutation onto the human methylmalonyl-CoA mutase locus of a bacterial artificial chromosome. A mouse model was developed using this construct.The transgene was found to be intact in the mouse model, with 7 copies integrated at a single site in chromosome 3. The phenotype of the hemizygous mouse was unchanged until crossed against a methylmalonyl-CoA mutase knockout mouse. Pups with no endogenous mouse methylmalonyl-CoA mutase and one copy of the transgene became ill and died within 24 hours. This severe phenotype could be partially rescued by the addition of a transgene carrying two copies of the normal human methylmalonyl-CoA mutase locus. The "humanized" mice were smaller than control litter mates and had high levels of methylmalonic acid in their blood and tissues. This new transgenic MMA stop codon model mimics (at both the phenotypic and genotypic levels) the key features of the human MMA disorder. It will allow the trialing of pharmacological and, cell and gene therapies for the treatment of MMA and other human metabolic disorders caused by stop codon mutations.


Subject(s)
Codon, Nonsense , Methylmalonyl-CoA Mutase/genetics , Amino Acid Metabolism, Inborn Errors/genetics , Amino Acid Metabolism, Inborn Errors/metabolism , Animals , Breeding , Disease Models, Animal , Female , Gene Order , Gene Targeting , Homologous Recombination , Humans , Male , Methylmalonic Acid/blood , Mice , Mice, Knockout , Mice, Transgenic , Microinjections , Transgenes
SELECTION OF CITATIONS
SEARCH DETAIL
...