Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 754
Filter
1.
Int J Mol Sci ; 25(10)2024 May 07.
Article in English | MEDLINE | ID: mdl-38791122

ABSTRACT

High-resolution melting (HRM) is a cost-efficient tool for targeted DNA methylation analysis. HRM yields the average methylation status across all CpGs in PCR products. Moreover, it provides information on the methylation pattern, e.g., the occurrence of monoallelic methylation. HRM assays have to be calibrated by analyzing DNA methylation standards of known methylation status and mixtures thereof. In general, DNA methylation levels determined by the classical calibration approach, including the whole temperature range in between normalization intervals, are in good agreement with the mean of the DNA methylation status of individual CpGs determined by pyrosequencing (PSQ), the gold standard of targeted DNA methylation analysis. However, the classical calibration approach leads to highly inaccurate results for samples with heterogeneous DNA methylation since they result in more complex melt curves, differing in their shape compared to those of DNA standards and mixtures thereof. Here, we present a novel calibration approach, i.e., temperature-wise calibration. By temperature-wise calibration, methylation profiles over temperature are obtained, which help in finding the optimal calibration range and thus increase the accuracy of HRM data, particularly for heterogeneous DNA methylation. For explaining the principle and demonstrating the potential of the novel calibration approach, we selected the promoter and two enhancers of MGMT, a gene encoding the repair protein MGMT.


Subject(s)
DNA Methylation , Nucleic Acid Denaturation , Calibration , Humans , Promoter Regions, Genetic , DNA Modification Methylases/genetics , Tumor Suppressor Proteins/genetics , Temperature , DNA Repair Enzymes/genetics , CpG Islands , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , DNA/genetics
2.
Oncologist ; 29(6): e843-e847, 2024 Jun 03.
Article in English | MEDLINE | ID: mdl-38597608

ABSTRACT

For cancer clinical trials that require central confirmation of tumor genomic profiling, exhaustion of tissue from standard-of-care testing may prevent enrollment. For Lung-MAP, a master protocol that requires results from a defined centralized clinical trial assay to assign patients to a therapeutic substudy, we developed a process to repurpose existing commercial vendor raw genomic data for eligibility: genomic data reanalysis (GDR). Molecular results for substudy assignment were successfully generated for 369 of the first 374 patients (98.7%) using GDR for Lung-MAP, with a median time from request to result of 9 days. During the same period, 691 of 791 (87.4%) tissue samples received successfully yielded results, in a median of 14 days beyond sample acquisition. GDR is a scalable bioinformatic pipeline that expedites reanalysis of existing data for clinical trials in which validated integral biomarker testing is required for participation.


Subject(s)
Biomarkers, Tumor , Lung Neoplasms , Humans , Lung Neoplasms/genetics , Lung Neoplasms/pathology , Biomarkers, Tumor/genetics , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , Genomics/methods
3.
Nucleic Acids Res ; 52(1): 114-124, 2024 Jan 11.
Article in English | MEDLINE | ID: mdl-38015437

ABSTRACT

Next-generation DNA sequencing (NGS) in short-read mode has recently been used for genetic testing in various clinical settings. NGS data accuracy is crucial in clinical settings, and several reports regarding quality control of NGS data, primarily focusing on establishing NGS sequence read accuracy, have been published thus far. Variant calling is another critical source of NGS errors that remains unexplored at the single-nucleotide level despite its established significance. In this study, we used a machine-learning-based method to establish an exome-wide benchmark of difficult-to-sequence regions at the nucleotide-residue resolution using 10 genome sequence features based on real-world NGS data accumulated in The Genome Aggregation Database (gnomAD) of the human reference genome sequence (GRCh38/hg38). The newly acquired metric, designated the 'UNMET score,' along with additional lines of structural information from the human genome, allowed us to assess the sequencing challenges within the exonic region of interest using conventional short-read NGS. Thus, the UNMET score could provide a basis for addressing potential sequential errors in protein-coding exons of the human reference genome sequence GRCh38/hg38 in clinical sequencing.


Subject(s)
Exome , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Humans , DNA , Exome/genetics , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards
4.
Nature ; 621(7978): 344-354, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37612512

ABSTRACT

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.


Subject(s)
Chromosomes, Human, Y , Genomics , Sequence Analysis, DNA , Humans , Base Sequence , Chromosomes, Human, Y/genetics , DNA, Satellite/genetics , Genetic Variation/genetics , Genetics, Population , Genomics/methods , Genomics/standards , Heterochromatin/genetics , Multigene Family/genetics , Reference Standards , Segmental Duplications, Genomic/genetics , Sequence Analysis, DNA/standards , Tandem Repeat Sequences/genetics , Telomere/genetics
5.
BMC Genomics ; 24(1): 117, 2023 Mar 16.
Article in English | MEDLINE | ID: mdl-36927511

ABSTRACT

BACKGROUND: Generating the most contiguous, accurate genome assemblies given available sequencing technologies is a long-standing challenge in genome science. With the rise of long-read sequencing, assembly challenges have shifted from merely increasing contiguity to correctly assembling complex, repetitive regions of interest, ideally in a phased manner. At present, researchers largely choose between two types of long read data: longer, but less accurate sequences, or highly accurate, but shorter reads (i.e., >Q20 or 99% accurate). To better understand how these types of long-read data as well as scale of data (i.e., mean length and sequencing depth) influence genome assembly outcomes, we compared genome assemblies for a caddisfly, Hesperophylax magnus, generated with longer, but less accurate, Oxford Nanopore (ONT) R9.4.1 and highly accurate PacBio HiFi (HiFi) data. Next, we expanded this comparison to consider the influence of highly accurate long-read sequence data on genome assemblies across 6750 plant and animal genomes. For this broader comparison, we used HiFi data as a surrogate for highly accurate long-reads broadly as we could identify when they were used from GenBank metadata. RESULTS: HiFi reads outperformed ONT reads in all assembly metrics tested for the caddisfly data set and allowed for accurate assembly of the repetitive ~ 20 Kb H-fibroin gene. Across plants and animals, genome assemblies that incorporated HiFi reads were also more contiguous. For plants, the average HiFi assembly was 501% more contiguous (mean contig N50 = 20.5 Mb) than those generated with any other long-read data (mean contig N50 = 4.1 Mb). For animals, HiFi assemblies were 226% more contiguous (mean contig N50 = 20.9 Mb) versus other long-read assemblies (mean contig N50 = 9.3 Mb). In plants, we also found limited evidence that HiFi may offer a unique solution for overcoming genomic complexity that scales with assembly size. CONCLUSIONS: Highly accurate long-reads generated with HiFi or analogous technologies represent a key tool for maximizing genome assembly quality for a wide swath of plants and animals. This finding is particularly important when resources only allow for one type of sequencing data to be generated. Ultimately, to realize the promise of biodiversity genomics, we call for greater uptake of highly accurate long-reads in future studies.


Subject(s)
Biodiversity , Genomics , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Genomics/methods , Genomics/standards , Genomics/trends , Insecta/classification , Insecta/genetics , Fibroins/genetics , Contig Mapping , Genome, Insect/genetics , Animals , Databases, Nucleic Acid , Reproducibility of Results , Meta-Analysis as Topic , Datasets as Topic , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , High-Throughput Nucleotide Sequencing/trends , Plants/genetics , Genome, Plant/genetics
6.
Eur J Med Genet ; 66(12): 104871, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38832911

ABSTRACT

Rare diseases encompass a diverse group of genetic disorders that affect a small proportion of the population. Identifying the underlying genetic causes of these conditions presents significant challenges due to their genetic heterogeneity and complexity. Conventional short-read sequencing (SRS) techniques have been widely used in diagnosing and investigating of rare diseases, with limitations due to the nature of short-read lengths. In recent years, long read sequencing (LRS) technologies have emerged as a valuable tool in overcoming these limitations. This minireview provides a concise overview of the applications of LRS in rare disease research and diagnosis, including the identification of disease-causing tandem repeat expansions, structural variations, and comprehensive analysis of pathogenic variants with LRS.


Subject(s)
High-Throughput Nucleotide Sequencing , Rare Diseases , Humans , Rare Diseases/genetics , Rare Diseases/diagnosis , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards
7.
Nature ; 611(7936): 519-531, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36261518

ABSTRACT

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Subject(s)
Chromosome Mapping , Diploidy , Genome, Human , Genomics , Humans , Chromosome Mapping/standards , Genome, Human/genetics , Haplotypes/genetics , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , Reference Standards , Genomics/methods , Genomics/standards , Chromosomes, Human/genetics , Genetic Variation/genetics
8.
Science ; 376(6588): 44-53, 2022 04.
Article in English | MEDLINE | ID: mdl-35357919

ABSTRACT

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.


Subject(s)
Genome, Human , Human Genome Project , Sequence Analysis, DNA/standards , Cell Line , Chromosomes, Artificial, Bacterial/genetics , Chromosomes, Human/genetics , Humans , Reference Values
9.
Science ; 376(6588): eabl3533, 2022 04.
Article in English | MEDLINE | ID: mdl-35357935

ABSTRACT

Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics.


Subject(s)
Genetic Variation , Genome, Human , Genomics/standards , Sequence Analysis, DNA/standards , Humans , Reference Standards
10.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Article in English | MEDLINE | ID: mdl-35042802

ABSTRACT

A global international initiative, such as the Earth BioGenome Project (EBP), requires both agreement and coordination on standards to ensure that the collective effort generates rapid progress toward its goals. To this end, the EBP initiated five technical standards committees comprising volunteer members from the global genomics scientific community: Sample Collection and Processing, Sequencing and Assembly, Annotation, Analysis, and IT and Informatics. The current versions of the resulting standards documents are available on the EBP website, with the recognition that opportunities, technologies, and challenges may improve or change in the future, requiring flexibility for the EBP to meet its goals. Here, we describe some highlights from the proposed standards, and areas where additional challenges will need to be met.


Subject(s)
Base Sequence/genetics , Eukaryota/genetics , Genomics/standards , Animals , Biodiversity , Genomics/methods , Humans , Reference Standards , Reference Values , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards
11.
Mol Biol Rep ; 49(1): 385-392, 2022 Jan.
Article in English | MEDLINE | ID: mdl-34716505

ABSTRACT

BACKGROUND: High-throughput sequencing involves library preparation and amplification steps, which may induce contamination across samples or between samples and the environment. METHODS: We tested the effect of applying an inline-index strategy, in which DNA indices of 6 bp were added to both ends of the inserts at the ligation step of library prep for resolving the data contamination problem. RESULTS: Our results showed that the contamination ranged from 0.29 to 1.25% in one experiment and from 0.83 to 27.01% in the other. We also found that contamination could be environmental or from reagents besides cross-contamination between samples. CONCLUSIONS: Inline-index method is a useful experimental design to clean up the data and address the contamination problem which has been plaguing high-throughput sequencing data in many applications.


Subject(s)
DNA/analysis , Indicators and Reagents/chemistry , Sequence Analysis, DNA/standards , DNA/chemistry , DNA Contamination , Gene Library , High-Throughput Nucleotide Sequencing/standards
12.
BMC Microbiol ; 21(1): 349, 2021 12 18.
Article in English | MEDLINE | ID: mdl-34922460

ABSTRACT

BACKGROUND: One limiting factor of short amplicon 16S rRNA gene sequencing approaches is the use of low DNA amounts in the amplicon generation step. Especially for low-biomass samples, insufficient or even commonly undetectable DNA amounts can limit or prohibit further analysis in standard protocols. RESULTS: Using a newly established protocol, very low DNA input amounts were found sufficient for reliable detection of bacteria using 16S rRNA gene sequencing compared to standard protocols. The improved protocol includes an optimized amplification strategy by using a digital droplet PCR. We demonstrate how PCR products are generated even when using very low concentrated DNA, unable to be detected by using a Qubit. Importantly, the use of different 16S rRNA gene primers had a greater effect on the resulting taxonomical profiles compared to using high or very low initial DNA amounts. CONCLUSION: Our improved protocol takes advantage of ddPCR and allows faithful amplification of very low amounts of template. With this, samples of low bacterial biomass become comparable to those with high amounts of bacteria, since the first and most biasing steps are the same. Besides, it is imperative to state DNA concentrations and volumes used and to include negative controls indicating possible shifts in taxonomical profiles. Despite this, results produced by using different primer pairs cannot be easily compared.


Subject(s)
Biomass , Polymerase Chain Reaction/methods , RNA, Ribosomal, 16S/genetics , Sequence Analysis, DNA/methods , Bacteria/classification , Bacteria/genetics , Bacteria/isolation & purification , Bias , DNA, Bacterial/analysis , DNA, Bacterial/genetics , Feces/microbiology , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Humans , Limit of Detection , Microbiota/genetics , Polymerase Chain Reaction/standards , Reproducibility of Results , Sequence Analysis, DNA/standards , Water Microbiology
13.
Eur Rev Med Pharmacol Sci ; 25(1 Suppl): 1-6, 2021 12.
Article in English | MEDLINE | ID: mdl-34890028

ABSTRACT

OBJECTIVE: While the bioinformatic workflow, from quality control to annotation, is quite standardized, the interpretation of variants is still a challenge. The decreasing cost of massively parallel NGS has produced hundreds of variants per patient to analyze and interpret. The ACMG "Standards and guidelines for the interpretation of sequence variants", widely adopted in clinical settings, assume that the clinician has a comprehensive knowledge of the literature and the disease. MATERIALS AND METHODS: To semi-automatize the application of the guidelines, we decided to develop an algorithm that exploits VarSome, a widely used platform that interprets variants on the basis of information from more than 70 genome databases. RESULTS: Here we explain how we integrated VarSome API into our existing clinical diagnostic pipeline for NGS data to obtain validated reproducible results as indicated by accuracy, sensitivity and specificity. CONCLUSIONS: We validated the automated pipeline to be sure that it was doing what we expected. We obtained 100% sensitivity, specificity and accuracy, confirming that it was suitable for use in a diagnostic setting.


Subject(s)
Algorithms , Genetic Variation/genetics , Genomics/standards , High-Throughput Nucleotide Sequencing/standards , Practice Guidelines as Topic/standards , Search Engine/standards , Computational Biology/methods , Computational Biology/standards , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Search Engine/methods , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards
14.
Nat Commun ; 12(1): 6386, 2021 11 04.
Article in English | MEDLINE | ID: mdl-34737275

ABSTRACT

A major drawback of single-cell ATAC-seq (scATAC-seq) is its sparsity, i.e., open chromatin regions with no reads due to loss of DNA material during the scATAC-seq protocol. Here, we propose scOpen, a computational method based on regularized non-negative matrix factorization for imputing and quantifying the open chromatin status of regulatory regions from sparse scATAC-seq experiments. We show that scOpen improves crucial downstream analysis steps of scATAC-seq data as clustering, visualization, cis-regulatory DNA interactions, and delineation of regulatory features. We demonstrate the power of scOpen to dissect regulatory changes in the development of fibrosis in the kidney. This identifies a role of Runx1 and target genes by promoting fibroblast to myofibroblast differentiation driving kidney fibrosis.


Subject(s)
Chromatin/metabolism , DNA/metabolism , Sequence Analysis, DNA/standards , Sequence Analysis, DNA/methods , Single-Cell Analysis/methods
15.
Genes (Basel) ; 12(11)2021 11 18.
Article in English | MEDLINE | ID: mdl-34828415

ABSTRACT

Multiple sequence alignment (MSA) is the basis for almost all sequence comparison and molecular phylogenetic inferences. Large-scale genomic analyses are typically associated with automated progressive MSA without subsequent manual adjustment, which itself is often error-prone because of the lack of a consistent and explicit criterion. Here, I outlined several commonly encountered alignment errors that cannot be avoided by progressive MSA for nucleotide, amino acid, and codon sequences. Methods that could be automated to fix such alignment errors were then presented. I emphasized the utility of position weight matrix as a new tool for MSA refinement and illustrated its usage by refining the MSA of nucleotide and amino acid sequences. The main advantages of the position weight matrix approach include (1) its use of information from all sequences, in contrast to other commonly used methods based on pairwise alignment scores and inconsistency measures, and (2) its speedy computation, making it suitable for a large number of long viral genomic sequences.


Subject(s)
Automation, Laboratory/methods , Genomics/methods , Sequence Alignment/methods , Algorithms , Animals , Automation, Laboratory/standards , Genomics/standards , Humans , Phylogeny , Sensitivity and Specificity , Sequence Alignment/standards , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , Sequence Analysis, Protein/methods , Sequence Analysis, Protein/standards
16.
Nat Biotechnol ; 39(9): 1141-1150, 2021 09.
Article in English | MEDLINE | ID: mdl-34504346

ABSTRACT

Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.


Subject(s)
Benchmarking , Exome Sequencing/standards , Neoplasms/genetics , Sequence Analysis, DNA/standards , Whole Genome Sequencing/standards , Cell Line , Cell Line, Tumor , High-Throughput Nucleotide Sequencing/methods , Humans , Mutation , Neoplasms/pathology , Reproducibility of Results
17.
Nat Biotechnol ; 39(9): 1129-1140, 2021 09.
Article in English | MEDLINE | ID: mdl-34504351

ABSTRACT

Assessing the reproducibility, accuracy and utility of massively parallel DNA sequencing platforms remains an ongoing challenge. Here the Association of Biomolecular Resource Facilities (ABRF) Next-Generation Sequencing Study benchmarks the performance of a set of sequencing instruments (HiSeq/NovaSeq/paired-end 2 × 250-bp chemistry, Ion S5/Proton, PacBio circular consensus sequencing (CCS), Oxford Nanopore Technologies PromethION/MinION, BGISEQ-500/MGISEQ-2000 and GS111) on human and bacterial reference DNA samples. Among short-read instruments, HiSeq 4000 and X10 provided the most consistent, highest genome coverage, while BGI/MGISEQ provided the lowest sequencing error rates. The long-read instrument PacBio CCS had the highest reference-based mapping rate and lowest non-mapping rate. The two long-read platforms PacBio CCS and PromethION/MinION showed the best sequence mapping in repeat-rich areas and across homopolymers. NovaSeq 6000 using 2 × 250-bp read chemistry was the most robust instrument for capturing known insertion/deletion events. This study serves as a benchmark for current genomics technologies, as well as a resource to inform experimental design and next-generation sequencing variant calling.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , Base Pair Mismatch , Benchmarking , DNA/genetics , DNA, Bacterial/genetics , Genome, Bacterial , Genome, Human , Humans
18.
Sci Rep ; 11(1): 18065, 2021 09 10.
Article in English | MEDLINE | ID: mdl-34508117

ABSTRACT

Advances in sequencing technology have allowed researchers to sequence DNA with greater ease and at decreasing costs. Main developments have focused on either sequencing many short sequences or fewer large sequences. Methods for sequencing mid-sized sequences of 600-5,000 bp are currently less efficient. For example, the PacBio Sequel I system yields ~ 100,000-300,000 reads with an accuracy per base pair of 90-99%. We sought to sequence several DNA populations of ~ 870 bp in length with a sequencing accuracy of 99% and to the greatest depth possible. We optimised a simple, robust method to concatenate genes of ~ 870 bp five times and then sequenced the resulting DNA of ~ 5,000 bp by PacBioSMRT long-read sequencing. Our method improved upon previously published concatenation attempts, leading to a greater sequencing depth, high-quality reads and limited sample preparation at little expense. We applied this efficient concatenation protocol to sequence nine DNA populations from a protein engineering study. The improved method is accompanied by a simple and user-friendly analysis pipeline, DeCatCounter, to sequence medium-length sequences efficiently at one-fifth of the cost.


Subject(s)
Computational Biology/methods , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA/methods , Animals , Base Sequence , Computational Biology/standards , Gene Library , High-Throughput Nucleotide Sequencing/methods , Mice , Molecular Sequence Annotation , Sequence Analysis, DNA/standards , Sequence Analysis, Protein
19.
Hum Immunol ; 82(11): 820-828, 2021 Nov.
Article in English | MEDLINE | ID: mdl-34479742

ABSTRACT

Next generation sequencing (NGS) is being applied for HLA typing in research and clinical settings. NGS HLA typing has made it feasible to sequence exons, introns and untranslated regions simultaneously, with significantly reduced labor and reagent cost per sample, rapid turnaround time, and improved HLA genotype accuracy. NGS technologies bring challenges for cost-effective computation, data processing and exchange of NGS-based HLA data. To address these challenges, guidelines and specifications such as Genotype List (GL) String, Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING), and Histoimmunogenetics Markup Language (HML) were proposed to streamline and standardize reporting of HLA genotypes. As part of the 17th International HLA and Immunogenetics Workshop (IHIW), we implemented standards and systems for HLA genotype reporting that included GL String, MIRING and HML, and found that misunderstanding or misinterpretations of these standards led to inconsistencies in the reporting of NGS HLA genotyping results. This may be due in part to a historical lack of centralized data reporting standards in the histocompatibility and immunogenetics community. We have worked with software and database developers, clinicians and scientists to address these issues in a collaborative fashion as part of the Data Standard Hackathons (DaSH) for NGS. Here we report several categories of challenges to the consistent exchange of NGS HLA genotyping data we have observed. We hope to address these challenges in future DaSH for NGS efforts.


Subject(s)
Genotyping Techniques/methods , High-Throughput Nucleotide Sequencing/standards , Histocompatibility Testing/methods , Immunogenetics/standards , Laboratories/standards , Genotyping Techniques/standards , HLA Antigens/genetics , Histocompatibility Testing/standards , Humans , Immunogenetics/methods , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , Software
20.
Eur J Hum Genet ; 29(12): 1804-1810, 2021 12.
Article in English | MEDLINE | ID: mdl-34426661

ABSTRACT

The clinical utility of rapid genomic sequencing (rGS) for critically unwell infants and children has been well demonstrated. Parental capacity for informed consent has been questioned, yet limited empirical data exists to guide clinical service delivery. In an Australian nationwide clinical implementation project offering rGS for critically unwell infants and children, parents made a decision about testing in under a day on average. This study reports parents' experiences of decision making for rGS within this rapid timeframe to inform pre-test counselling procedures for future practice. A nationwide sample of 30 parents, whose children were amongst the first to receive rGS, were interviewed. We found that framing and delivery of rGS require careful consideration to support autonomous decision making and avoid implicit coercion in a stressful intensive care setting. Many parents described feeling 'special' and 'lucky' that they were receiving access to expensive and typically time-consuming genomic sequencing. Thematic analysis revealed a spectrum of complexity for decision making about rGS. Some parents consented quickly and were resistant to pre-test counselling. Others had a range of concerns and described deliberating about their decision, which they felt rushed to make. This research identifies tensions between the medical imperative of rGS and parents' decision making, which need to be addressed as rGS becomes routine clinical care.


Subject(s)
Attitude , Genetic Counseling/psychology , Genetic Testing/standards , Parents/psychology , Sequence Analysis, DNA/standards , Adult , Child , Critical Care/psychology , Critical Care/standards , Female , Genetic Counseling/standards , Humans , Male , Patient Participation , Surveys and Questionnaires
SELECTION OF CITATIONS
SEARCH DETAIL
...