Search | VHL Regional Portal

Targeted Single Primer Enrichment Sequencing with Single End Duplex-UMI.

Peng, Quan; Xu, Chang; Kim, Daniel; Lewis, Marcus; DiCarlo, John; Wang, Yexun.

Sci Rep ; 9(1): 4810, 2019 03 18.

Article in English | MEDLINE | ID: mdl-30886209

ABSTRACT

For specific detection of somatic variants at very low levels, artifacts from the NGS workflow have to be eliminated. Various approaches using unique molecular identifiers (UMI) to analytically remove NGS artifacts have been described. Among them, Duplex-seq was shown to be highly effective, by leveraging the sequence complementarity of two DNA strands. However, all of the published Duplex-seq implementations so far required pair-end sequencing and in the case of combining duplex sequencing with target enrichment, lengthy hybridization enrichment was required. We developed a simple protocol, which enabled the retrieval of duplex UMI in multiplex PCR based enrichment and sequencing. Using this protocol and reference materials, we demonstrated the accurate detection of known SNVs at 0.1-0.2% allele fractions, aided by duplex UMI. We also observed that low level base substitution artifacts could be introduced when preparing in vitro DNA reference materials, which could limit their utility as a benchmarking tool for variant detection at very low levels. Our new targeted sequencing method offers the benefit of using duplex UMI to remove NGS artifacts in a much more simplified workflow than existing targeted duplex sequencing methods.

Subject(s)

DNA Mutational Analysis/methods , DNA/isolation & purification , High-Throughput Nucleotide Sequencing/methods , Multiplex Polymerase Chain Reaction/methods , Artifacts , DNA/genetics , DNA Mutational Analysis/instrumentation , High-Throughput Nucleotide Sequencing/instrumentation , Humans , Limit of Detection , Multiplex Polymerase Chain Reaction/instrumentation , Mutation , Neoplasms/diagnosis , Neoplasms/genetics , Polymorphism, Single Nucleotide , Workflow

smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers.

Xu, Chang; Gu, Xiujing; Padmanabhan, Raghavendra; Wu, Zhong; Peng, Quan; DiCarlo, John; Wang, Yexun.

Bioinformatics ; 35(8): 1299-1309, 2019 04 15.

Article in English | MEDLINE | ID: mdl-30192920

ABSTRACT

MOTIVATION: Low-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end repair and the first PCR cycle, cannot be corrected with single-strand UMIs and impose fundamental limits to UMI-based variant calling. RESULTS: We developed smCounter2, a UMI-based variant caller for targeted sequencing data and an upgrade from the current version of smCounter. Compared to smCounter, smCounter2 features lower detection limit that decreases from 1 to 0.5%, better overall accuracy (particularly in non-coding regions), a consistent threshold that can be applied to both deep and shallow sequencing runs, and easier use via a Docker image and code for read pre-processing. We benchmarked smCounter2 against several state-of-the-art UMI-based variant calling methods using multiple datasets and demonstrated smCounter2's superior performance in detecting somatic variants. At the core of smCounter2 is a statistical test to determine whether the allele frequency of the putative variant is significantly above the background error rate, which was carefully modeled using an independent dataset. The improved accuracy in non-coding regions was mainly achieved using novel repetitive region filters that were specifically designed for UMI data. AVAILABILITY AND IMPLEMENTATION: The entire pipeline is available at https://github.com/qiaseq/qiaseq-dna under MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Software , Gene Frequency , High-Throughput Nucleotide Sequencing , Mutation , Polymerase Chain Reaction , Sequence Analysis, DNA

Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller.

Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun.

BMC Genomics ; 18(1): 5, 2017 01 03.

Article in English | MEDLINE | ID: mdl-28049435

ABSTRACT

BACKGROUND: Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. RESULTS: We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. CONCLUSIONS: We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.

Subject(s)

Alleles , Base Sequence , DNA Barcoding, Taxonomic , Gene Frequency , Genetic Variation , Computational Biology/methods , Models, Statistical , Multiplex Polymerase Chain Reaction , Reproducibility of Results , Sensitivity and Specificity

Reducing amplification artifacts in high multiplex amplicon sequencing by using molecular barcodes.

Peng, Quan; Vijaya Satya, Ravi; Lewis, Marcus; Randad, Pranay; Wang, Yexun.

BMC Genomics ; 16: 589, 2015 Aug 07.

Article in English | MEDLINE | ID: mdl-26248467

ABSTRACT

BACKGROUND: PCR amplicon sequencing has been widely used as a targeted approach for both DNA and RNA sequence analysis. High multiplex PCR has further enabled the enrichment of hundreds of amplicons in one simple reaction. At the same time, the performance of PCR amplicon sequencing can be negatively affected by issues such as high duplicate reads, polymerase artifacts and PCR amplification bias. Recently researchers have made some good progress in addressing these shortcomings by incorporating molecular barcodes into PCR primer design. So far, most work has been demonstrated using one to a few pairs of primers, which limits the size of the region one can analyze. RESULTS: We developed a simple protocol, which enables the use of molecular barcodes in high multiplex PCR with hundreds of amplicons. Using this protocol and reference materials, we demonstrated the applications in accurate variant calling at very low fraction over a large region and in targeted RNA quantification. We also evaluated the protocol's utility in profiling FFPE samples. CONCLUSIONS: We demonstrated the successful implementation of molecular barcodes in high multiplex PCR, with multiplex scale many times higher than earlier work. We showed that the new protocol combines the benefits of both high multiplex PCR and molecular barcodes, i.e. the analysis of a very large region, low DNA input requirement, very good reproducibility and the ability to detect as low as 1% mutations with minimal false positives (FP).

Subject(s)

DNA Barcoding, Taxonomic/methods , High-Throughput Nucleotide Sequencing/methods , Multiplex Polymerase Chain Reaction/methods , Artifacts , DNA Primers/genetics , Humans , RNA/genetics , Reproducibility of Results , Sequence Analysis/methods

Comparison of somatic mutation calling methods in amplicon and whole exome sequence data.

Xu, Huilei; DiCarlo, John; Satya, Ravi Vijaya; Peng, Quan; Wang, Yexun.

BMC Genomics ; 15: 244, 2014 Mar 28.

Article in English | MEDLINE | ID: mdl-24678773

ABSTRACT

BACKGROUND: High-throughput sequencing is rapidly becoming common practice in clinical diagnosis and cancer research. Many algorithms have been developed for somatic single nucleotide variant (SNV) detection in matched tumor-normal DNA sequencing. Although numerous studies have compared the performance of various algorithms on exome data, there has not yet been a systematic evaluation using PCR-enriched amplicon data with a range of variant allele fractions. The recently developed gold standard variant set for the reference individual NA12878 by the NIST-led "Genome in a Bottle" Consortium (NIST-GIAB) provides a good resource to evaluate admixtures with various SNV fractions. RESULTS: Using the NIST-GIAB gold standard, we compared the performance of five popular somatic SNV calling algorithms (GATK UnifiedGenotyper followed by simple subtraction, MuTect, Strelka, SomaticSniper and VarScan2) for matched tumor-normal amplicon and exome sequencing data. CONCLUSIONS: We demonstrated that the five commonly used somatic SNV calling methods are applicable to both targeted amplicon and exome sequencing data. However, the sensitivities of these methods vary based on the allelic fraction of the mutation in the tumor sample. Our analysis can assist researchers in choosing a somatic SNV calling method suitable for their specific needs.

Subject(s)

Computational Biology/methods , Exome , High-Throughput Nucleotide Sequencing , Mutation , Software , Algorithms , Databases, Nucleic Acid , Genomics/methods , Humans , Point Mutation , ROC Curve , Sensitivity and Specificity

Genomic analysis of microRNA time-course expression in liver of mice treated with genotoxic carcinogen N-ethyl-N-nitrosourea.

Li, Zhiguang; Branham, William S; Dial, Stacey L; Wang, Yexun; Guo, Lei; Shi, Leming; Chen, Tao.

BMC Genomics ; 11: 609, 2010 Oct 28.

Article in English | MEDLINE | ID: mdl-21029445

ABSTRACT

BACKGROUND: Dysregulated expression of microRNAs (miRNAs) has been previously observed in human cancer tissues and shown promise in defining tumor status. However, there is little information as to if or when expression changes of miRNAs occur in normal tissues after carcinogen exposure. RESULTS: To explore the possible time-course changes of miRNA expression induced by a carcinogen, we treated mice with one dose of 120 mg/kg N-ethyl-N-nitrosourea (ENU), a model genotoxic carcinogen, and vehicle control. The miRNA expression profiles were assessed in the mouse livers in a time-course design. miRNAs were isolated from the livers at days 1, 3, 7, 15, 30 and 120 after the treatment and their expression was determined using a miRNA PCR Array. Principal component analysis of the miRNA expression profiles showed that miRNA expression at post-treatment days (PTDs) 7 and 15 were different from those at the other time points and the control. The number of differentially expressed miRNAs (DEMs) changed over time (3, 5, 14, 32, 5 and 5 at PTDs 1, 3, 7, 15, 30 and 120, respectively). The magnitude of the expression change varied with time with the highest changes at PTDs 7 or 15 for most of the DEMs. In silico functional analysis of the DEMs at PTDs 7 and 15 indicated that the major functions of these ENU-induced DEMs were associated with DNA damage, DNA repair, apoptosis and other processes related to carcinogenesis. CONCLUSION: Our results showed that many miRNAs changed their expression to respond the exposure of the genotoxic carcinogen ENU and the number and magnitude of the changes were highest at PTDs 7 to 15. Thus, one to two weeks after the exposure is the best time for miRNA expression sampling.

Subject(s)

Carcinogens/toxicity , Ethylnitrosourea/toxicity , Gene Expression Regulation/drug effects , Genomics/methods , Liver/metabolism , MicroRNAs/genetics , Mutagens/toxicity , Animals , Cluster Analysis , Female , Gene Expression Profiling , Genome/genetics , Liver/drug effects , Mice , MicroRNAs/metabolism , Polymerase Chain Reaction , Principal Component Analysis , Reproducibility of Results , Taq Polymerase/metabolism , Time Factors

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL