Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 58
Filter
1.
ArXiv ; 2024 May 08.
Article in English | MEDLINE | ID: mdl-38764594

ABSTRACT

The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). In addition, the Portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. From inception to execution, the portal was developed with a conscientious focus on strong data governance principles and practices. Extensive efforts ensured a commitment to Canadian privacy laws, data security standards, and organizational processes. This Portal has been coupled with other resources like Viral AI and was further leveraged by the Coronavirus Variants Rapid Response Network (CoVaRR-Net) to produce a suite of continually updated analytical tools and notebooks. Here we highlight this Portal, including its contextual data not available elsewhere, and the 'Duotang', a web platform that presents key genomic epidemiology and modeling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the Portal (COVID-MVP, CoVizu), are all open-source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.

2.
bioRxiv ; 2024 Feb 29.
Article in English | MEDLINE | ID: mdl-38464143

ABSTRACT

DNA sequencing of tumours to identify somatic mutations has become a critical tool to guide the type of treatment given to cancer patients. The gold standard for mutation calling is comparing sequencing data from the tumour to a matched normal sample to avoid mis-classifying inherited SNPs as mutations. This procedure works extremely well, but in certain situations only a tumour sample is available. While approaches have been developed to find mutations without a matched normal, they have limited accuracy or require specific types of input data (e.g. ultra-deep sequencing). Here we explore the application of single molecule long read sequencing to calling somatic mutations without matched normal samples. We develop a simple theoretical framework to show how haplotype phasing is an important source of information for determining whether a variant is a somatic mutation. We then use simulations to assess the range of experimental parameters (tumour purity, sequencing depth) where this approach is effective. These ideas are developed into a prototype somatic mutation caller, smrest, and its use is demonstrated on two highly mutated cancer cell lines. Finally, we argue that this approach has potential to measure clinically important biomarkers that are based on the genome-wide distribution of mutations: tumour mutation burden and mutation signatures.

3.
Viruses ; 15(8)2023 08 18.
Article in English | MEDLINE | ID: mdl-37632107

ABSTRACT

The GENCOV study aims to identify patient factors which affect COVID-19 severity and outcomes. Here, we aimed to evaluate patient characteristics, acute symptoms and their persistence, and associations with hospitalization. Participants were recruited at hospital sites across the Greater Toronto Area in Ontario, Canada. Patient-reported demographics, medical history, and COVID-19 symptoms and complications were collected through an intake survey. Regression analyses were performed to identify associations with outcomes including hospitalization and COVID-19 symptoms. In total, 966 responses were obtained from 1106 eligible participants (87% response rate) between November 2020 and May 2022. Increasing continuous age (aOR: 1.05 [95%CI: 1.01-1.08]) and BMI (aOR: 1.17 [95%CI: 1.10-1.24]), non-White/European ethnicity (aOR: 2.72 [95%CI: 1.22-6.05]), hypertension (aOR: 2.78 [95%CI: 1.22-6.34]), and infection by viral variants (aOR: 5.43 [95%CI: 1.45-20.34]) were identified as risk factors for hospitalization. Several symptoms including shortness of breath and fever were found to be more common among inpatients and tended to persist for longer durations following acute illness. Sex, age, ethnicity, BMI, vaccination status, viral strain, and underlying health conditions were associated with developing and having persistent symptoms. By improving our understanding of risk factors for severe COVID-19, our findings may guide COVID-19 patient management strategies by enabling more efficient clinical decision making.


Subject(s)
COVID-19 , Humans , COVID-19/epidemiology , Hospitalization , Inpatients , Ontario/epidemiology , Risk Factors
4.
JAMA Netw Open ; 6(7): e2324963, 2023 07 03.
Article in English | MEDLINE | ID: mdl-37477915

ABSTRACT

Importance: Nirmatrelvir-ritonavir is an oral antiviral medication that improves outcomes in SARS-CoV-2 infections. However, there is concern that antiviral resistance will develop and that these viruses could be selected for after treatment. Objective: To determine the prevalence of low-frequency SARS-CoV-2 variants in patient samples that could be selected for by nirmatrelvir-ritonavir. Design, Setting, and Participants: This retrospective cohort study was conducted at 4 laboratories that serve community hospitals, academic tertiary care centers, and COVID-19 assessment centers in Ontario, Canada. Participants included symptomatic or asymptomatic patients who tested positive for SARS-CoV-2 virus and submitted virus samples for diagnostic testing between March 2020 and January 2023. Exposure: SARS-CoV-2 infection. Main Outcomes and Measures: Samples with sufficient viral load underwent next-generation genome sequencing to identify low-frequency antiviral resistance variants that could not be identified through conventional sequencing. Results: This study included 78 866 clinical samples with next-generation whole-genome sequencing data for SARS-CoV-2. Low-frequency variants in the viral nsp5 gene were identified in 128 isolates (0.16%), and no single variant associated with antiviral resistance was predominate. Conclusions and Relevance: This cohort study of low-frequency variants resistant to nirmatrelvir-ritonavir found that these variants were very rare in samples from patients with SARS-CoV-2, suggesting that selection of these variants by nirmatrelvir-ritonavir following the initiation of treatment may also be rare. Surveillance efforts that involve sequencing of viral isolates should continue to monitor for novel resistance variants as nirmatrelvir-ritonavir is used more broadly.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Ontario/epidemiology , SARS-CoV-2/genetics , Ritonavir/therapeutic use , Prevalence , Cohort Studies , Retrospective Studies , COVID-19/epidemiology , Antiviral Agents/pharmacology , Antiviral Agents/therapeutic use , COVID-19 Drug Treatment
5.
Infect Control Hosp Epidemiol ; 44(11): 1829-1833, 2023 Nov.
Article in English | MEDLINE | ID: mdl-36912329

ABSTRACT

OBJECTIVE: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) hospital outbreaks have been common and devastating during the coronavirus disease 2019 (COVID-19) pandemic. Understanding SARS-CoV-2 transmission in these environments is critical for preventing and managing outbreaks. DESIGN: Outbreak investigation through epidemiological mapping and whole-genome sequencing phylogeny. SETTING: Hospital in-patient medical unit outbreak in Toronto, Canada, from November 2020 to January 2021. PARTICIPANTS: The outbreak involved 8 patients and 10 staff and was associated with 3 patient deaths. RESULTS: Patients being cared for in geriatric chairs at the nursing station were at high risk for both acquiring and transmitting SARS-CoV-2 to other patients and staff. Furthermore, given the informal nature of these transmissions, they were not initially recognized, which led to further transmission and missing the opportunity for preventative COVID-19 therapies. CONCLUSIONS: During outbreak prevention and management, the risk of informal patient care settings, such as geriatric chairs, should be considered. During high-risk periods or during outbreaks, efforts should be made to care for patients in their rooms when possible.


Subject(s)
COVID-19 , Humans , Aged , COVID-19/epidemiology , SARS-CoV-2/genetics , Disease Outbreaks/prevention & control , Canada/epidemiology , Hospitals
6.
J Mol Diagn ; 25(3): 133-142, 2023 03.
Article in English | MEDLINE | ID: mdl-36565986

ABSTRACT

The use of standard next-generation sequencing technologies to detect key mutations in IDH genes for glioma diagnosis imposes several challenges, including high capital cost and turnaround delays associated with the need for batch testing. For both glioma testing and testing in other tumor types where highly specific mutation identification is required, the high-throughput nature of next-generation sequencing limits the feasibility of using it as a primary approach in clinical laboratories. We hypothesized that third-generation nanopore sequencing by Oxford Nanopore Technologies has the capability to overcome these limitations. This study aimed to develop and validate a nanopore-based IDH mutation detection assay for clinical practice using glioma formalin-fixed, paraffin-embedded (FFPE) tissue. Glioma FFPE (n = 66) samples with confirmed IDH gene mutational status were sequenced on the MinION device using an amplicon-based approach. All cases were concordant when compared with the reference results. Limit of blank and limit of detection for the variant allele fraction were 1.5% and 3.3%, respectively, at 500× read depth per gene. Total sequencing cost per sample was CAD$50 to CAD$134 with results being available in 9 to 15 hours. These findings demonstrate that nanopore-sequencing technology can be leveraged to develop low-cost, high-performance clinical sequencing-based assays with quick turnaround times to support the detection of targeted mutations in FFPE tumor tissue.


Subject(s)
Glioma , Nanopore Sequencing , Humans , Point Mutation , Laboratories, Clinical , Glioma/genetics , Mutation , High-Throughput Nucleotide Sequencing/methods
7.
Hum Genet ; 142(2): 181-192, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36331656

ABSTRACT

Rapid advancements of genome sequencing (GS) technologies have enhanced our understanding of the relationship between genes and human disease. To incorporate genomic information into the practice of medicine, new processes for the analysis, reporting, and communication of GS data are needed. Blood samples were collected from adults with a PCR-confirmed SARS-CoV-2 (COVID-19) diagnosis (target N = 1500). GS was performed. Data were filtered and analyzed using custom pipelines and gene panels. We developed unique patient-facing materials, including an online intake survey, group counseling presentation, and consultation letters in addition to a comprehensive GS report. The final report includes results generated from GS data: (1) monogenic disease risks; (2) carrier status; (3) pharmacogenomic variants; (4) polygenic risk scores for common conditions; (5) HLA genotype; (6) genetic ancestry; (7) blood group; and, (8) COVID-19 viral lineage. Participants complete pre-test genetic counseling and confirm preferences for secondary findings before receiving results. Counseling and referrals are initiated for clinically significant findings. We developed a genetic counseling, reporting, and return of results framework that integrates GS information across multiple areas of human health, presenting possibilities for the clinical application of comprehensive GS data in healthy individuals.


Subject(s)
COVID-19 , Genetic Counseling , Adult , Humans , COVID-19/epidemiology , COVID-19/genetics , SARS-CoV-2/genetics , Genomics/methods , Genotype
8.
Curr Protoc ; 2(10): e534, 2022 Oct.
Article in English | MEDLINE | ID: mdl-36205462

ABSTRACT

Genome sequencing holds the promise for great public health benefits. It is currently being used in the context of rare disease diagnosis and novel gene identification, but also has the potential to identify genetic disease risk factors in healthy individuals. Genome sequencing technologies are currently being used to identify genetic factors that may influence variability in symptom severity and immune response among patients infected by SARS-CoV-2. The GENCOV study aims to look at the relationship between genetic, serological, and biochemical factors and variability of SARS-CoV-2 symptom severity, and to evaluate the utility of returning genome screening results to study participants. Study participants select which results they wish to receive with a decision aid. Medically actionable information for diagnosis, disease risk estimation, disease prevention, and patient management are provided in a comprehensive genome report. Using a combination of bioinformatics software and custom tools, this article describes a pipeline for the analysis and reporting of genetic results to individuals with COVID-19, including HLA genotyping, large-scale continental ancestry estimation, and pharmacogenomic analysis to determine metabolizer status and drug response. In addition, this pipeline includes reporting of medically actionable conditions from comprehensive gene panels for Cardiology, Neurology, Metabolism, Hereditary Cancer, and Hereditary Kidney, and carrier screening for reproductive planning. Incorporated into the genome report are polygenic risk scores for six diseases-coronary artery disease; atrial fibrillation; type-2 diabetes; and breast, prostate, and colon cancer-as well as blood group genotyping analysis for ABO and Rh blood types and genotyping for other antigens of clinical relevance. The genome report summarizes the findings of these analyses in a way that extensively communicates clinically relevant results to patients and their physicians. © 2022 Wiley Periodicals LLC. Basic Protocol 1: HLA genotyping and disease association Basic Protocol 2: Large-scale continental ancestry estimation Basic Protocol 3: Dosage recommendations for pharmacogenomic gene variants associated with drug response Support Protocol: System setup.


Subject(s)
Blood Group Antigens , COVID-19 , COVID-19/genetics , Computational Biology/methods , Genomics , Humans , Male , SARS-CoV-2/genetics
10.
Sci Rep ; 12(1): 10867, 2022 06 27.
Article in English | MEDLINE | ID: mdl-35760824

ABSTRACT

The emergence of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV-2) was met with rapid development of robust molecular-based detection assays. Many SARS-CoV-2 molecular tests target multiple genetic regions of the virus to maximize detection and protect against diagnostic escape. Despite the relatively moderate mutational rate of SARS-CoV-2, numerous mutations with known negative impact on diagnostic assays have been identified. In early 2021, we identified four samples positive for SARS-CoV-2 with a nucleocapsid (N) gene drop out on Cepheid Xpert® Xpress SARS-CoV-2 assay. Sequencing revealed a single common mutation in the N gene C29200T. Spatiotemporal analysis showed that the mutation was found in at least six different Canadian provinces from May 2020 until May 2021. Phylogenetic analysis showed that this mutation arose multiple times in Canadian samples and is present in six different variants of interest and of concern. The Cepheid testing platform is commonly used in Canada including in remote regions. As such, the existence of N gene mutation dropouts required further investigation. While commercial SARS-CoV-2 molecular detection assays have contributed immensely to the response effort, many vendors are reluctant to make primer/probe sequences publicly available. Proprietary primer/probe sequences create diagnostic 'blind spots' for global SARS-CoV-2 sequence monitoring and limits the ability to detect and track the presence and prevalence of diagnostic escape mutations. We hope that our industry partners will seriously consider making primer/probe sequences available, so that diagnostic escape mutants can be identified promptly and responded to appropriately to maintain diagnostic accuracy.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/diagnosis , COVID-19/epidemiology , COVID-19 Testing , Canada/epidemiology , Clinical Laboratory Techniques , Humans , Mutation , Nucleocapsid/genetics , Phylogeny , Polymerase Chain Reaction , SARS-CoV-2/genetics , Sensitivity and Specificity
11.
BMJ Open ; 11(9): e052842, 2021 09 30.
Article in English | MEDLINE | ID: mdl-34593505

ABSTRACT

INTRODUCTION: There is considerable variability in symptoms and severity of COVID-19 among patients infected by the SARS-CoV-2 virus. Linking host and virus genome sequence information to antibody response and biological information may identify patient or viral characteristics associated with poor and favourable outcomes. This study aims to (1) identify characteristics of the antibody response that result in maintained immune response and better outcomes, (2) determine the impact of genetic differences on infection severity and immune response, (3) determine the impact of viral lineage on antibody response and patient outcomes and (4) evaluate patient-reported outcomes of receiving host genome, antibody and viral lineage results. METHODS AND ANALYSIS: A prospective, observational cohort study is being conducted among adult patients with COVID-19 in the Greater Toronto Area. Blood samples are collected at baseline (during infection) and 1, 6 and 12 months after diagnosis. Serial antibody titres, isotype, antigen target and viral neutralisation will be assessed. Clinical data will be collected from chart reviews and patient surveys. Host genomes and T-cell and B-cell receptors will be sequenced. Viral genomes will be sequenced to identify viral lineage. Regression models will be used to test associations between antibody response, physiological response, genetic markers and patient outcomes. Pathogenic genomic variants related to disease severity, or negative outcomes will be identified and genome wide association will be conducted. Immune repertoire diversity during infection will be correlated with severity of COVID-19 symptoms and human leucocyte antigen-type associated with SARS-CoV-2 infection. Participants can learn their genome sequencing, antibody and viral sequencing results; patient-reported outcomes of receiving this information will be assessed through surveys and qualitative interviews. ETHICS AND DISSEMINATION: This study was approved by Clinical Trials Ontario Streamlined Ethics Review System (CTO Project ID: 3302) and the research ethics boards at participating hospitals. Study findings will be disseminated through peer-reviewed publications, conference presentations and end-users.


Subject(s)
COVID-19 , Genome-Wide Association Study , Humans , Observational Studies as Topic , Prospective Studies , SARS-CoV-2 , Severity of Illness Index
12.
mSphere ; 6(3)2021 05 05.
Article in English | MEDLINE | ID: mdl-33952657

ABSTRACT

Genome-wide variation in SARS-CoV-2 reveals evolution and transmission dynamics which are critical considerations for disease control and prevention decisions. Here, we review estimates of the genome-wide viral mutation rates, summarize current COVID-19 case load in the province of Ontario, Canada (5 January 2021), and analyze published SARS-CoV-2 genomes from Ontario (collected prior to 24 November 2020) to test for more infectious genetic variants or lineages. The reported mutation rate (∼10-6 nucleotide [nt]-1 cycle-1) for SARS-CoV-2 is typical for coronaviruses. Analysis of published SARS-CoV-2 genomes revealed that the G614 spike protein mutation has dominated infections in Ontario and that SARS-CoV-2 lineages present in Ontario have not differed significantly in their rate of spread. These results suggest that the SARS-CoV-2 population circulating in Ontario has not changed significantly to date. However, ongoing genome monitoring is essential for identification of new variants and lineages that may contribute to increased viral transmission.


Subject(s)
Genetic Variation/genetics , Genome, Viral/genetics , Mutation Rate , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/genetics , Base Sequence , COVID-19/pathology , Humans , Ontario , Phylogeny , Sequence Analysis, RNA
13.
Nat Protoc ; 16(3): 1343-1375, 2021 03.
Article in English | MEDLINE | ID: mdl-33514943

ABSTRACT

During maturation, eukaryotic precursor RNAs undergo processing events including intron splicing, 3'-end cleavage, and polyadenylation. Here we describe nanopore analysis of co-transcriptional processing (nano-COP), a method for probing the timing and patterns of RNA processing. An extension of native elongating transcript sequencing, which quantifies transcription genome-wide through short-read sequencing of nascent RNA 3' ends, nano-COP uses long-read nascent RNA sequencing to observe global patterns of RNA processing. First, nascent RNA is stringently purified through a combination of 4-thiouridine metabolic labeling and cellular fractionation. In contrast to cDNA or short-read-based approaches relying on reverse transcription or amplification, the sample is sequenced directly through nanopores to reveal the native context of nascent RNA. nano-COP identifies both active transcription sites and splice isoforms of single RNA molecules during synthesis, providing insight into patterns of intron removal and the physical coupling between transcription and splicing. The nano-COP protocol yields data within 3 d.


Subject(s)
Protein Modification, Translational/physiology , RNA Precursors/analysis , Sequence Analysis, RNA/methods , Animals , Exons/genetics , Humans , Introns/genetics , Protein Modification, Translational/genetics , RNA/genetics , RNA Polymerase II/metabolism , RNA Precursors/genetics , RNA Precursors/metabolism , RNA Processing, Post-Transcriptional/genetics , RNA Processing, Post-Transcriptional/physiology , RNA Splicing/genetics , RNA, Messenger/genetics , Transcription, Genetic/genetics
14.
J Clin Virol Plus ; 1(1): 100010, 2021 Jun.
Article in English | MEDLINE | ID: mdl-35261998

ABSTRACT

Background: Travel-related dissemination of SARS-CoV-2 continues to contribute to the global pandemic. A novel SARS-CoV-2 lineage (B.1.177) reportedly arose in Spain in the summer of 2020, with subsequent spread across Europe linked to travel by infected individuals. Surveillance and monitoring through the use of whole genome sequencing (WGS) offers insights into the global and local movement of pathogens such as SARS-CoV-2 and can detect introductions of novel variants. Methods: We analysed the genomes of SARS-CoV-2 sequenced for surveillance purposes from specimens received by Public Health Ontario (Sept 6 - Oct 10, 2020), collected from individuals in eastern Ontario, which comprised the study sample. Taxonomic lineages were identified using pangolin (v2.08) and phylogenetic analysis incorporated publicly available genomes covering the same time period as the study sample. Epidemiological data collected from laboratory requisitions and standard reportable disease case investigation was integrated into the analysis. Results: Genomic surveillance identified a COVID-19 case with SARS-CoV-2 lineage B.1.177 from an individual in eastern Ontario in late September, 2020. The individual had recently returned from Europe. Genomic analysis with publicly available data indicate the most closely related genomes to this specimen were from Southern Europe. Genomic surveillance did not identify further cases with this lineage. Conclusions: Genomic surveillance allowed for early detection of a novel SARS-CoV-2 lineage in Ontario which was deemed to be travel related. This type of genomic-based surveillance is a key tool to measure the effectiveness of public health measures such as mandatory self-isolation for returned travellers, aimed at preventing onward transmission of newly introduced lineages of SARS-CoV-2.

15.
Trends Biotechnol ; 39(1): 72-89, 2021 01.
Article in English | MEDLINE | ID: mdl-32620324

ABSTRACT

Modified nucleotides in mRNA are an essential addition to the standard genetic code of four nucleotides in animals, plants, and their viruses. The emerging field of epitranscriptomics examines nucleotide modifications in mRNA and their impact on gene expression. The low abundance of nucleotide modifications and technical limitations, however, have hampered systematic analysis of their occurrence and functions. Selective chemical and immunological identification of modified nucleotides has revealed global candidate topology maps for many modifications in mRNA, but further technical advances to increase confidence will be necessary. Single-molecule sequencing introduced by Oxford Nanopore now promises to overcome such limitations, and we summarize current progress with a particular focus on the bioinformatic challenges of this novel sequencing technology.


Subject(s)
Computational Biology , RNA, Messenger , Animals , Computational Biology/trends , Mutation/genetics , RNA, Messenger/genetics , Sequence Analysis, RNA/trends
17.
Nat Methods ; 17(12): 1191-1199, 2020 12.
Article in English | MEDLINE | ID: mdl-33230324

ABSTRACT

Probing epigenetic features on DNA has tremendous potential to advance our understanding of the phased epigenome. In this study, we use nanopore sequencing to evaluate CpG methylation and chromatin accessibility simultaneously on long strands of DNA by applying GpC methyltransferase to exogenously label open chromatin. We performed nanopore sequencing of nucleosome occupancy and methylome (nanoNOMe) on four human cell lines (GM12878, MCF-10A, MCF-7 and MDA-MB-231). The single-molecule resolution allows footprinting of protein and nucleosome binding, and determination of the combinatorial promoter epigenetic signature on individual molecules. Long-read sequencing makes it possible to robustly assign reads to haplotypes, allowing us to generate a fully phased human epigenome, consisting of chromosome-level allele-specific profiles of CpG methylation and chromatin accessibility. We further apply this to a breast cancer model to evaluate differential methylation and accessibility between cancerous and noncancerous cells.


Subject(s)
Breast Neoplasms/genetics , Chromatin/genetics , DNA Methylation/genetics , Nanopore Sequencing/methods , Cell Line, Tumor , CpG Islands/genetics , DNA/metabolism , Epigenome/genetics , Female , Genome, Human/genetics , Humans , MCF-7 Cells , Methyltransferases/metabolism , Promoter Regions, Genetic/genetics , Sequence Analysis, DNA
18.
Nat Commun ; 11(1): 4748, 2020 09 21.
Article in English | MEDLINE | ID: mdl-32958763

ABSTRACT

The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts.


Subject(s)
Genome, Human/genetics , Mutation , Neoplasms/genetics , Base Composition , DNA, Intergenic , Databases, Genetic , Exome/genetics , Exons , Humans , Retrospective Studies , Exome Sequencing , Whole Genome Sequencing
19.
BMC Bioinformatics ; 21(1): 343, 2020 Aug 05.
Article in English | MEDLINE | ID: mdl-32758139

ABSTRACT

BACKGROUND: Nanopore sequencing enables portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these outcomes requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. However, comparing raw nanopore signals to a biological reference sequence is a computationally complex task. The dynamic programming algorithm called Adaptive Banded Event Alignment (ABEA) is a crucial step in polishing sequencing data and identifying non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c) to efficiently run on heterogeneous CPU-GPU architectures. RESULTS: By optimising memory, computations and load balancing between CPU and GPU, we demonstrate how f5c can perform ∼3-5 × faster than an optimised version of the original CPU-only implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. CONCLUSIONS: Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at https://github.com/hasindu2008/f5c .


Subject(s)
Computer Graphics , Nanopores , Signal Processing, Computer-Assisted , Algorithms , Computational Biology , Databases as Topic , Genome, Human , Humans , Sequence Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...