Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 124
Filter
1.
PLoS One ; 18(4): e0283470, 2023.
Article in English | MEDLINE | ID: covidwho-2299867

ABSTRACT

Mutations of protein kinases and cytokines are common and can cause cancer and other diseases. However, our understanding of the mutability in these genes remains rudimentary. Therefore, given previously known factors which are associated with high mutation rates, we analyzed how many genes encoding druggable kinases match (i) proximity to telomeres or (ii) high A+T content. We extracted this genomic information using the National Institute of Health Genome Data Viewer. First, among 129 druggable human kinase genes studied, 106 genes satisfied either factors (i) or (ii), resulting in an 82% match. Moreover, a similar 85% match rate was found in 73 genes encoding pro-inflammatory cytokines of multisystem inflammatory syndrome in children. Based on these promising matching rates, we further compared these two factors utilizing 20 de novo mutations of mice exposed to space-like ionizing radiation, in order to determine if these seemingly random mutations were similarly predictable with this strategy. However, only 10 of these 20 murine genetic loci met (i) or (ii), leading to only a 50% match. When compared with the mechanisms of top-selling FDA approved drugs, this data suggests that matching rate analysis on druggable targets is feasible to systematically prioritize the relative mutability-and therefore therapeutic potential-of the novel candidates.


Subject(s)
Cytokines , Neoplasms , Child , Humans , Animals , Mice , Cytokines/genetics , Cytokines/therapeutic use , Genomics/methods , Mutation , Neoplasms/genetics , Telomere/genetics
2.
OMICS ; 27(4): 141-152, 2023 04.
Article in English | MEDLINE | ID: covidwho-2297045

ABSTRACT

Omics data are multidimensional, heterogeneous, and high throughput. Robust computational methods and machine learning (ML)-based models offer new prospects to accelerate the data-to-knowledge trajectory. Deep learning (DL) is a powerful subset of ML inspired by brain structure and has created unprecedented momentum in bioinformatics and computational biology research. This article provides an overview of the current DL models applied to multi-omics data for both the beginner and the expert user. Additionally, COVID-19 will continue to impact planetary health as a pandemic and an endemic disease, with genomic and multi-omic pathophysiology. DL offers, therefore, new ways of harnessing systems biology research on COVID-19 diagnostics and therapeutics. Herein, we discuss, first, the statistical ML algorithms and essential deep architectures. Then, we review DL applications in multi-omics data analysis and their intersection with COVID-19. Finally, challenges and several promising directions are highlighted going forward in the current era of COVID-19.


Subject(s)
COVID-19 , Deep Learning , Humans , Genomics/methods , Computational Biology/methods , Machine Learning
3.
Annu Rev Pharmacol Toxicol ; 63: 65-76, 2023 Jan 20.
Article in English | MEDLINE | ID: covidwho-2267221

ABSTRACT

A long-standing recognition that information from human genetics studies has the potential to accelerate drug discovery has led to decades of research on how to leverage genetic and phenotypic information for drug discovery. Established simple and advanced statistical methods that allow the simultaneous analysis of genotype and clinical phenotype data by genome- and phenome-wide analyses, colocalization analyses with quantitative trait loci data from transcriptomics and proteomics data sets from different tissues, and Mendelian randomization are essential tools for drug development in the postgenomic era. Numerous studies have demonstrated how genomic data provide opportunities for the identification of new drug targets, the repurposing of drugs, and drug safety analyses. With an increase in the number of biobanks that enable linking in-depth omics data with rich repositories of phenotypic traits via electronic health records, more powerful ways for the evaluation and validation of drug targets will continue to expand across different disciplines of clinical research.


Subject(s)
Electronic Health Records , Genome-Wide Association Study , Humans , Genome-Wide Association Study/methods , Genomics/methods , Phenotype , Drug Discovery
4.
Microb Genom ; 9(1)2023 01.
Article in English | MEDLINE | ID: covidwho-2230369

ABSTRACT

Pathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations as well as research. In order to make use of pathogen genomics data, they must be interpreted using contextual data (metadata). Contextual data include sample metadata, laboratory methods, patient demographics, clinical outcomes and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration and their use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool's web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission. In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.


Subject(s)
COVID-19 , Humans , COVID-19/epidemiology , Pandemics , SARS-CoV-2/genetics , Canada , Genomics/methods
5.
Genome Biol ; 23(1): 256, 2022 12 13.
Article in English | MEDLINE | ID: covidwho-2196402

ABSTRACT

Spatial omics technologies enable a deeper understanding of cellular organizations and interactions within a tissue of interest. These assays can identify specific compartments or regions in a tissue with differential transcript or protein abundance, delineate their interactions, and complement other methods in defining cellular phenotypes. A variety of spatial methodologies are being developed and commercialized; however, these techniques differ in spatial resolution, multiplexing capability, scale/throughput, and coverage. Here, we review the current and prospective landscape of single cell to subcellular resolution spatial omics technologies and analysis tools to provide a comprehensive picture for both research and clinical applications.


Subject(s)
Genomics , Proteomics , Genomics/methods , Proteomics/methods , Prospective Studies
6.
Genome Biol ; 23(1): 182, 2022 08 29.
Article in English | MEDLINE | ID: covidwho-2038853

ABSTRACT

With the arrival of telomere-to-telomere (T2T) assemblies of the human genome comes the computational challenge of efficiently and accurately constructing multiple genome alignments at an unprecedented scale. By identifying nucleotides across genomes which share a common ancestor, multiple genome alignments commonly serve as the bedrock for comparative genomics studies. In this review, we provide an overview of the algorithmic template that most multiple genome alignment methods follow. We also discuss prospective areas of improvement of multiple genome alignment for keeping up with continuously arriving high-quality T2T assembled genomes and for unlocking clinically-relevant insights.


Subject(s)
Genome, Human , Genomics , Genomics/methods , Humans , Nucleotides , Telomere/genetics
7.
Front Public Health ; 10: 974667, 2022.
Article in English | MEDLINE | ID: covidwho-2022999

ABSTRACT

Next Generation Sequencing (NGS) is the gold standard for the detection of new variants of SARS-CoV-2 including those which have immune escape properties, high infectivity, and variable severity. This test is helpful in genomic surveillance, for planning appropriate and timely public health interventions. But labs with NGS facilities are not available in small or medium research settings due to the high cost of setting up such a facility. Transportation of samples from many places to few centers for NGS testing also produces delays due to transportation and sample overload leading in turn to delays in patient management and community interventions. This becomes more important for patients traveling from hotspot regions or those suspected of harboring a new variant. Another major issue is the high cost of NGS-based tests. Thus, it may not be a good option for an economically viable surveillance program requiring immediate result generation and patient follow-up. The current study used a cost-effective facility which can be set up in a common research lab and which is replicable in similar centers with expertise in Sanger nucleotide sequencing. More samples can be processed at a time and can generate the results in a maximum of 2 days (1 day for a 24 h working lab). We analyzed the nucleotide sequence of the Receptor Binding Domain (RBD) region of SARS-CoV-2 by the Sanger sequencing using in-house developed methods. The SARS-CoV-2 variant surveillance was done during the period of March 2021 to May 2022 in the Northern region of Kerala, a state in India with a population of 36.4 million, for implementing appropriate timely interventions. Our findings broadly agree with those from elsewhere in India and other countries during the period.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Humans , SARS-CoV-2/genetics
8.
Brief Bioinform ; 23(5)2022 09 20.
Article in English | MEDLINE | ID: covidwho-1985037

ABSTRACT

As recently demonstrated by the COVID-19 pandemic, large-scale pathogen genomic data are crucial to characterize transmission patterns of human infectious diseases. Yet, current methods to process raw sequence data into analysis-ready variants remain slow to scale, hampering rapid surveillance efforts and epidemiological investigations for disease control. Here, we introduce an accelerated, scalable, reproducible, and cost-effective framework for pathogen genomic variant identification and present an evaluation of its performance and accuracy across benchmark datasets of Plasmodium falciparum malaria genomes. We demonstrate superior performance of the GPU framework relative to standard pipelines with mean execution time and computational costs reduced by 27× and 4.6×, respectively, while delivering 99.9% accuracy at enhanced reproducibility.


Subject(s)
COVID-19 , Communicable Diseases , Malaria , COVID-19/epidemiology , COVID-19/genetics , Genomics/methods , Humans , Pandemics , Reproducibility of Results
9.
Elife ; 112022 08 02.
Article in English | MEDLINE | ID: covidwho-1969731

ABSTRACT

Tracking the emergence and spread of SARS-CoV-2 lineages using phylogenetics has proven critical to inform the timing and stringency of COVID-19 public health interventions. We investigated the effectiveness of international travel restrictions at reducing SARS-CoV-2 importations and transmission in Canada in the first two waves of 2020 and early 2021. Maximum likelihood phylogenetic trees were used to infer viruses' geographic origins, enabling identification of 2263 (95% confidence interval: 2159-2366) introductions, including 680 (658-703) Canadian sublineages, which are international introductions resulting in sampled Canadian descendants, and 1582 (1501-1663) singletons, introductions with no sampled descendants. Of the sublineages seeded during the first wave, 49% (46-52%) originated from the USA and were primarily introduced into Quebec (39%) and Ontario (36%), while in the second wave, the USA was still the predominant source (43%), alongside a larger contribution from India (16%) and the UK (7%). Following implementation of restrictions on the entry of foreign nationals on 21 March 2020, importations declined from 58.5 (50.4-66.5) sublineages per week to 10.3-fold (8.3-15.0) lower within 4 weeks. Despite the drastic reduction in viral importations following travel restrictions, newly seeded sublineages in summer and fall 2020 contributed to the persistence of COVID-19 cases in the second wave, highlighting the importance of sustained interventions to reduce transmission. Importations rebounded further in November, bringing newly emergent variants of concern (VOCs). By the end of February 2021, there had been an estimated 30 (19-41) B.1.1.7 sublineages imported into Canada, which increasingly displaced previously circulating sublineages by the end of the second wave.Although viral importations are nearly inevitable when global prevalence is high, with fewer importations there are fewer opportunities for novel variants to spark outbreaks or outcompete previously circulating lineages.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , Genomics/methods , Humans , Ontario , Phylogeny , SARS-CoV-2/genetics
10.
Science ; 377(6609): 960-966, 2022 08 26.
Article in English | MEDLINE | ID: covidwho-1962060

ABSTRACT

Understanding the circumstances that lead to pandemics is important for their prevention. We analyzed the genomic diversity of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) early in the coronavirus disease 2019 (COVID-19) pandemic. We show that SARS-CoV-2 genomic diversity before February 2020 likely comprised only two distinct viral lineages, denoted "A" and "B." Phylodynamic rooting methods, coupled with epidemic simulations, reveal that these lineages were the result of at least two separate cross-species transmission events into humans. The first zoonotic transmission likely involved lineage B viruses around 18 November 2019 (23 October to 8 December), and the separate introduction of lineage A likely occurred within weeks of this event. These findings indicate that it is unlikely that SARS-CoV-2 circulated widely in humans before November 2019 and define the narrow window between when SARS-CoV-2 first jumped into humans and when the first cases of COVID-19 were reported. As with other coronaviruses, SARS-CoV-2 emergence likely resulted from multiple zoonotic events.


Subject(s)
COVID-19 , Pandemics , SARS-CoV-2 , Viral Zoonoses , Animals , COVID-19/epidemiology , COVID-19/transmission , COVID-19/virology , Computer Simulation , Genetic Variation , Genomics/methods , Humans , Molecular Epidemiology , Phylogeny , SARS-CoV-2/classification , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification , Viral Zoonoses/epidemiology , Viral Zoonoses/virology
11.
Nat Biotechnol ; 40(11): 1644-1653, 2022 Nov.
Article in English | MEDLINE | ID: covidwho-1878538

ABSTRACT

Genome-wide association studies in combination with single-cell genomic atlases can provide insights into the mechanisms of disease-causal genetic variation. However, identification of disease-relevant or trait-relevant cell types, states and trajectories is often hampered by sparsity and noise, particularly in the analysis of single-cell epigenomic data. To overcome these challenges, we present SCAVENGE, a computational algorithm that uses network propagation to map causal variants to their relevant cellular context at single-cell resolution. We demonstrate how SCAVENGE can help identify key biological mechanisms underlying human genetic variation, applying the method to blood traits at distinct stages of human hematopoiesis, to monocyte subsets that increase the risk for severe Coronavirus Disease 2019 (COVID-19) and to intermediate lymphocyte developmental states that predispose to acute leukemia. Our approach not only provides a framework for enabling variant-to-function insights at single-cell resolution but also suggests a more general strategy for maximizing the inferences that can be made using single-cell genomic data.


Subject(s)
COVID-19 , Genome-Wide Association Study , Humans , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , COVID-19/genetics , Genomics/methods , Epigenomics
12.
Bioinformatics ; 38(14): 3501-3512, 2022 Jul 11.
Article in English | MEDLINE | ID: covidwho-1873853

ABSTRACT

MOTIVATION: The importance of chromatin loops in gene regulation is broadly accepted. There are mainly two approaches to predict chromatin loops: transcription factor (TF) binding-dependent approach and genomic variation-based approach. However, neither of these approaches provides an adequate understanding of gene regulation in human tissues. To address this issue, we developed a deep learning-based chromatin loop prediction model called Deep Learning-based Universal Chromatin Interaction Annotator (DeepLUCIA). RESULTS: Although DeepLUCIA does not use TF binding profile data which previous TF binding-dependent methods critically rely on, its prediction accuracies are comparable to those of the previous TF binding-dependent methods. More importantly, DeepLUCIA enables the tissue-specific chromatin loop predictions from tissue-specific epigenomes that cannot be handled by genomic variation-based approach. We demonstrated the utility of the DeepLUCIA by predicting several novel target genes of SNPs identified in genome-wide association studies targeting Brugada syndrome, COVID-19 severity and age-related macular degeneration. Availability and implementation DeepLUCIA is freely available at https://github.com/bcbl-kaist/DeepLUCIA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
COVID-19 , Deep Learning , Humans , Chromatin , Genome-Wide Association Study , Genomics/methods
13.
Gene ; 823: 146387, 2022 May 20.
Article in English | MEDLINE | ID: covidwho-1814425

ABSTRACT

The coronavirus disease 2019 (COVID-19) quickly swept over the world, becoming one of the most devastating outbreaks in human history. Being the first pandemic in the post-genomic era, advancements in genomics contributed significantly to scientific understanding and public health response to COVID-19. Genomic technologies have been employed by researchers all over the world to better understand the biology of SARS-CoV-2 and its origin, genomic diversity, and evolution. Worldwide genomic resources have greatly aided in the investigation of the COVID-19 pandemic. The pandemic has ushered in a new era of genomic surveillance, wherein scientists are tracking the changes of the SARS-CoV-2 genome in real-time at the international and national levels. Availability of genomic and proteomic information enables the rapid development of molecular diagnostics and therapeutics. The advent of high-throughput sequencing and genome editing technologies led to the development of modern vaccines. We briefly discuss the impact of genomics in the ongoing COVID-19 pandemic in this review.


Subject(s)
COVID-19/prevention & control , Genomics/methods , SARS-CoV-2/genetics , COVID-19/virology , Evolution, Molecular , Genome, Viral , Humans , Molecular Epidemiology , Mutation , SARS-CoV-2/classification
14.
Sci Rep ; 12(1): 4813, 2022 03 21.
Article in English | MEDLINE | ID: covidwho-1764202

ABSTRACT

Comprehensive cancer genomic profile (CGP) tests are being implemented under Japanese universal health insurance system. However, the clinical usefulness of CGP test for breast cancer patients has not been evaluated. Of the 310 patients who underwent CGP testing at our institution between November 2019 and April 2021, 35 patients with metastatic breast cancer whose treatment strategy was discussed by our molecular tumor board within the study period were investigated after exclusion of 2 cases that could not be analyzed. The turn-around time, drug accessibility, and germline identification detection were evaluated. The subtype was luminal in 20 patients (57.1%), triple-negative in 12 patients (34.3%), and luminal-HER2 in 3 patients (8.6%). Actionable gene mutations were detected in 30 patients (85.7%), and 7 patients (20.0%) were recommended for clinical trial participation, with the drug administered to 2 patients (5.7%). Three patients (8.6%) died due to disease progression before the test results were disclosed. We report the results of an initial assessment of the utility of CGP testing for patients with metastatic breast cancer under Japanese universal health insurance system. Conducting CGP tests at a more appropriate time could provide patients with greater benefit from treatments based on their specific gene mutations.


Subject(s)
Breast Neoplasms , Biomarkers, Tumor/genetics , Breast Neoplasms/diagnosis , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Female , Genetic Profile , Genomics/methods , Humans , Mutation
15.
Int J Mol Sci ; 23(6)2022 Mar 17.
Article in English | MEDLINE | ID: covidwho-1760649

ABSTRACT

For tiling of the SARS-CoV-2 genome, the ARTIC Network provided a V4 protocol using 99 pairs of primers for amplicon production and is currently the widely used amplicon-based approach. However, this technique has regions of low sequence coverage and is labour-, time-, and cost-intensive. Moreover, it requires 14 pairs of primers in two separate PCRs to obtain spike gene sequences. To overcome these disadvantages, we proposed a single PCR to efficiently detect spike gene mutations. We proposed a bioinformatic protocol that can process FASTQ reads into spike gene consensus sequences to accurately call spike protein variants from sequenced samples or to fairly express the cases of missing amplicons. We evaluated the in silico detection rate of primer sets that yield amplicon sizes of 400, 1200, and 2500 bp for spike gene sequencing of SARS-CoV-2 to be 59.49, 76.19, and 92.20%, respectively. The in silico detection rate of our proposed single PCR primers was 97.07%. We demonstrated the robustness of our analytical protocol against 3000 Oxford Nanopore sequencing runs of distinct datasets, thus ensuring high-integrity sequencing of spike genes for variant SARS-CoV-2 determination. Our protocol works well with the data yielded from versatile primer designs, making it easy to determine spike protein variants.


Subject(s)
COVID-19/virology , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/genetics , Computational Biology , Genome, Viral , Genomics/methods , Humans , Mutation , Mutation Rate , Phylogeny , SARS-CoV-2/classification , Sequence Analysis, DNA
16.
Viruses ; 14(2)2022 02 21.
Article in English | MEDLINE | ID: covidwho-1705877

ABSTRACT

Recombination creates mosaic genomes containing regions with mixed ancestry, and the accumulation of such events over time can complicate greatly many aspects of evolutionary inference. Here, we developed a sliding window bootstrap (SWB) method to generate genomic bootstrap (GB) barcodes to highlight the regions supporting phylogenetic relationships. The method was applied to an alignment of 56 sarbecoviruses, including SARS-CoV and SARS-CoV-2, responsible for the SARS epidemic and COVID-19 pandemic, respectively. The SWB analyses were also used to construct a consensus tree showing the most reliable relationships and better interpret hidden phylogenetic signals. Our results revealed that most relationships were supported by just a few genomic regions and confirmed that three divergent lineages could be found in bats from Yunnan: SCoVrC, which groups SARS-CoV related coronaviruses from China; SCoV2rC, which includes SARS-CoV-2 related coronaviruses from Southeast Asia and Yunnan; and YunSar, which contains a few highly divergent viruses recently described in Yunnan. The GB barcodes showed evidence for ancient recombination between SCoV2rC and YunSar genomes, as well as more recent recombination events between SCoVrC and SCoV2rC genomes. The recombination and phylogeographic patterns suggest a strong host-dependent selection of the viral RNA-dependent RNA polymerase. In addition, SARS-CoV-2 appears as a mosaic genome composed of regions sharing recent ancestry with three bat SCoV2rCs from Yunnan (RmYN02, RpYN06, and RaTG13) or related to more ancient ancestors in bats from Yunnan and Southeast Asia. Finally, our results suggest that viral circular RNAs may be key molecules for the mechanism of recombination.


Subject(s)
DNA Barcoding, Taxonomic/methods , Disease Reservoirs/veterinary , Evolution, Molecular , Genomics/methods , Recombination, Genetic , SARS-CoV-2/genetics , Severe acute respiratory syndrome-related coronavirus/genetics , Animals , China , Chiroptera/virology , Disease Reservoirs/virology , Genome, Viral , Phylogeography
17.
Clin Infect Dis ; 74(8): 1419-1428, 2022 04 28.
Article in English | MEDLINE | ID: covidwho-1703304

ABSTRACT

BACKGROUND: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants concerning for enhanced transmission, evasion of immune responses, or associated with severe disease have motivated the global increase in genomic surveillance. In the current study, large-scale whole-genome sequencing was performed between November 2020 and the end of March 2021 to provide a phylodynamic analysis of circulating variants over time. In addition, we compared the viral genomic features of March 2020 and March 2021. METHODS: A total of 1600 complete SARS-CoV-2 genomes were analyzed. Genomic analysis was associated with laboratory diagnostic volumes and positivity rates, in addition to an analysis of the association of selected variants of concern/variants of interest with disease severity and outcomes. Our real-time surveillance features a cohort of specimens from patients who tested positive for SARS-CoV-2 after completion of vaccination. RESULTS: Our data showed genomic diversity over time that was not limited to the spike sequence. A significant increase in the B.1.1.7 lineage (alpha variant) in March 2021 as well as a transient circulation of regional variants that carried both the concerning S: E484K and S: P681H substitutions were noted. Lineage B.1.243 was significantly associated with intensive care unit admission and mortality. Genomes recovered from fully vaccinated individuals represented the predominant lineages circulating at specimen collection time, and people with those infections recovered with no hospitalizations. CONCLUSIONS: Our results emphasize the importance of genomic surveillance coupled with laboratory, clinical, and metadata analysis for a better understanding of the dynamics of viral spread and evolution.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , Genome, Viral , Genomics/methods , Humans , SARS-CoV-2/genetics
19.
Virology ; 568: 56-71, 2022 03.
Article in English | MEDLINE | ID: covidwho-1665518

ABSTRACT

SARS-CoV-2, the seventh coronavirus known to infect humans, can cause severe life-threatening respiratory pathologies. To better understand SARS-CoV-2 evolution, genome-wide analyses have been made, including the general characterization of its codons usage profile. Here we present a bioinformatic analysis of the evolution of SARS-CoV-2 codon usage over time using complete genomes collected since December 2019. Our results show that SARS-CoV-2 codon usage pattern is antagonistic to, and it is getting farther away from that of the human host. Further, a selection of deoptimized codons over time, which was accompanied by a decrease in both the codon adaptation index and the effective number of codons, was observed. All together, these findings suggest that SARS-CoV-2 could be evolving, at least from the perspective of the synonymous codon usage, to become less pathogenic.


Subject(s)
COVID-19/epidemiology , COVID-19/virology , Codon Usage , Codon , Evolution, Molecular , Pandemics , SARS-CoV-2/genetics , Betacoronavirus/classification , Betacoronavirus/genetics , Gene Expression Regulation, Viral , Genome, Viral , Genomics/methods , Humans , Open Reading Frames , Organ Specificity , Phylogeny
20.
Viruses ; 14(2)2022 01 23.
Article in English | MEDLINE | ID: covidwho-1651072

ABSTRACT

The COVID-19 pandemic is driven by Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) that emerged in 2019 and quickly spread worldwide. Genomic surveillance has become the gold standard methodology used to monitor and study this fast-spreading virus and its constantly emerging lineages. The current deluge of SARS-CoV-2 genomic data generated worldwide has put additional pressure on the urgent need for streamlined bioinformatics workflows. Here, we describe a workflow developed by our group to process and analyze large-scale SARS-CoV-2 Illumina amplicon sequencing data. This workflow automates all steps of SARS-CoV-2 reference-based genomic analysis: data processing, genome assembly, PANGO lineage assignment, mutation analysis and the screening of intrahost variants. The pipeline is capable of processing a batch of around 100 samples in less than half an hour on a personal laptop or in less than five minutes on a server with 50 threads. The workflow presented here is available through Docker or Singularity images, allowing for implementation on laptops for small-scale analyses or on high processing capacity servers or clusters. Moreover, the low requirements for memory and CPU cores and the standardized results provided by ViralFlow highlight it as a versatile tool for SARS-CoV-2 genomic analysis.


Subject(s)
Automation, Laboratory/methods , Genome, Viral , Mutation , SARS-CoV-2/classification , SARS-CoV-2/genetics , Workflow , Computational Biology/instrumentation , Computational Biology/methods , Genomics/instrumentation , Genomics/methods , Humans , Phylogeny , Spike Glycoprotein, Coronavirus/genetics , Virus Assembly/genetics
SELECTION OF CITATIONS
SEARCH DETAIL