Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Nat Commun ; 14(1): 8149, 2023 Dec 09.
Article in English | MEDLINE | ID: mdl-38071244

ABSTRACT

Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines. We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard. Next, we present our tool vcfdist and demonstrate the importance of enforcing local phasing for evaluation accuracy. We then introduce the notion of partial credit for mostly-correct calls and present an algorithm for clustering dependent variants. Lastly, we motivate using alignment distance metrics to supplement precision-recall curves for understanding variant calling performance. We evaluate the performance of 64 phased Truth Challenge V2 submissions and show that vcfdist improves measured insertion and deletion performance consistency across variant representations from R2 = 0.97243 for baseline vcfeval to 0.99996 for vcfdist.


Subject(s)
Benchmarking , Genome, Human , Humans , Genome, Human/genetics , High-Throughput Nucleotide Sequencing , Algorithms , Whole Genome Sequencing , Polymorphism, Single Nucleotide , Software
2.
J Biotechnol Biomed ; 6(1): 13-23, 2023.
Article in English | MEDLINE | ID: mdl-36937168

ABSTRACT

Long read sequencing technology is becoming increasingly popular for Precision Medicine applications like Whole Genome Sequencing (WGS) and microbial abundance estimation. Minimap2 is the state-of-the-art aligner and mapper used by the leading long read sequencing technologies, today. However, Minimap2 on CPUs is very slow for long noisy reads. ~60-70% of the run-time on a CPU comes from the highly sequential chaining step in Minimap2. On the other hand, most Point-of-Care computational workflows in long read sequencing use Graphics Processing Units (GPUs). We present minimap2-accelerated (mm2-ax), a heterogeneous design for sequence mapping and alignment where minimap2's compute intensive chaining step is sped up on the GPU and demonstrate its time and cost benefits. We extract better intra-read parallelism from chaining without losing mapping accuracy by forward transforming Minimap2's chaining algorithm. Moreover, we better utilize the high memory available on modern cloud instances apart from better workload balancing, data locality and minimal branch divergence on the GPU. We show mm2-ax on an NVIDIA A100 GPU improves the chaining step with 5.41 - 2.57X speedup and 4.07 - 1.93X speedup : costup over the fastest version of Minimap2, mm2-fast, benchmarked on a Google Cloud Platform instance of 30 SIMD cores.

3.
Arch Clin Biomed Res ; 7(1): 45-57, 2023.
Article in English | MEDLINE | ID: mdl-36938368

ABSTRACT

ReadUntil enables Oxford Nanopore Technology's (ONT) sequencers to selectively sequence reads of target species in real-time. This enables efficient microbial enrichment for applications such as microbial abundance estimation and is particularly beneficial for metagenomic samples with a very high fraction of non-target reads (> 99% can be human reads). However, read-until requires a fast and accurate software filter that analyzes a short prefix of a read and determines if it belongs to a microbe of interest (target) or not. The baseline Read Until pipeline uses a deep neural network-based basecaller called Guppy and is slow and inaccurate for this task (~60% of bases sequenced are unclassified). We present RawMap, an efficient CPU-only microbial species-agnostic Read Until classifier for filtering non-target human reads in the squiggle space. RawMap uses a Support Vector Machine (SVM), which is trained to distinguish human from microbe using non-linear and non-stationary characteristics of ONT's squiggle output (continuous electrical signals). Compared to the baseline Read Until pipeline, RawMap is a 1327X faster classifier and significantly improves the sequencing time and cost, and compute time savings. We show that RawMap augmented pipelines reduce sequencing time and cost by ~24% and computing cost by 22%. Additionally, since RawMap is agnostic to microbial species, it can also classify microbial species it is not trained on. We also discuss how RawMap may be used as an alternative to the RT-PCR test for viral load quantification of SARS-CoV-2.

4.
BMC Bioinformatics ; 24(1): 98, 2023 Mar 16.
Article in English | MEDLINE | ID: mdl-36927439

ABSTRACT

Despite recent improvements in nanopore basecalling accuracy, germline variant calling of small insertions and deletions (INDELs) remains poor. Although precision and recall for single nucleotide polymorphisms (SNPs) now exceeds 99.5%, INDEL recall remains below 80% for standard R9.4.1 flow cells. We show that read phasing and realignment can recover a significant portion of false negative INDELs. In particular, we extend Needleman-Wunsch affine gap alignment by introducing new gap penalties for more accurately aligning repeated n-polymer sequences such as homopolymers ([Formula: see text]) and tandem repeats ([Formula: see text]). At the same precision, haplotype phasing improves INDEL recall from 63.76 to [Formula: see text] and nPoRe realignment improves it further to [Formula: see text].


Subject(s)
Algorithms , Software , Sequence Analysis, DNA , High-Throughput Nucleotide Sequencing , INDEL Mutation , Polymorphism, Single Nucleotide
5.
Commun Biol ; 5(1): 708, 2022 07 15.
Article in English | MEDLINE | ID: mdl-35840782

ABSTRACT

Molecular markers are essential for cancer diagnosis, clinical trial enrollment, and some surgical decision making, motivating ultra-rapid, intraoperative variant detection. Sequencing-based detection is considered the gold standard approach, but typically takes hours to perform due to time-consuming DNA extraction, targeted amplification, and library preparation times. In this work, we present a proof-of-principle approach for sub-1 hour targeted variant detection using real-time DNA sequencers. By modifying existing protocols, optimizing for diagnostic time-to-result, we demonstrate confirmation of a hot-spot mutation from tumor tissue in ~52 minutes. To further reduce time, we explore rapid, targeted Loop-mediated Isothermal Amplification (LAMP) and design a bioinformatics tool-LAMPrey-to process sequenced LAMP product. LAMPrey's concatemer aware alignment algorithm is designed to maximize recovery of diagnostically relevant information leading to a more rapid detection versus standard read alignment approaches. Using LAMPrey, we demonstrate confirmation of a hot-spot mutation (250x support) from tumor tissue in less than 30 minutes.


Subject(s)
Neoplasms , Base Sequence , Humans , Neoplasms/diagnosis , Neoplasms/genetics , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...