Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 33
Filter
1.
Brief Bioinform ; 23(2)2022 03 10.
Article in English | MEDLINE | ID: covidwho-1639367

ABSTRACT

Genomic epidemiology is important to study the COVID-19 pandemic, and more than two million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomic sequences were deposited into public databases. However, the exponential increase of sequences invokes unprecedented bioinformatic challenges. Here, we present the Coronavirus GenBrowser (CGB) based on a highly efficient analysis framework and a node-picking rendering strategy. In total, 1,002,739 high-quality genomic sequences with the transmission-related metadata were analyzed and visualized. The size of the core data file is only 12.20 MB, highly efficient for clean data sharing. Quick visualization modules and rich interactive operations are provided to explore the annotated SARS-CoV-2 evolutionary tree. CGB binary nomenclature is proposed to name each internal lineage. The pre-analyzed data can be filtered out according to the user-defined criteria to explore the transmission of SARS-CoV-2. Different evolutionary analyses can also be easily performed, such as the detection of accelerated evolution and ongoing positive selection. Moreover, the 75 genomic spots conserved in SARS-CoV-2 but non-conserved in other coronaviruses were identified, which may indicate the functional elements specifically important for SARS-CoV-2. The CGB was written in Java and JavaScript. It not only enables users who have no programming skills to analyze millions of genomic sequences, but also offers a panoramic vision of the transmission and evolution of SARS-CoV-2.


Subject(s)
COVID-19/epidemiology , COVID-19/virology , Public Health Surveillance/methods , SARS-CoV-2/genetics , Software , Web Browser , Computational Biology/methods , DNA Mutational Analysis , Databases, Genetic , Genome, Viral , Genomics , Humans , Molecular Epidemiology/methods , Molecular Sequence Annotation , Mutation
2.
Int J Mol Sci ; 23(2)2022 Jan 14.
Article in English | MEDLINE | ID: covidwho-1633064

ABSTRACT

Peripheral blood mononuclear cells (PBMCs) belong to the innate and adaptive immune system and are highly sensitive and responsive to changes in their systemic environment. In this study, we focused on the time course of transcriptional changes in freshly isolated human PBMCs 4, 8, 24 and 48 h after onset of stimulation with the active vitamin D metabolite 1α,25-dihydroxyvitamin D3 (1,25(OH)2D3). Taking all four time points together, 662 target genes were identified and segregated either by time of differential gene expression into 179 primary and 483 secondary targets or by driver of expression change into 293 direct and 369 indirect targets. The latter classification revealed that more than 50% of target genes were primarily driven by the cells' response to ex vivo exposure than by the nuclear hormone and largely explained its down-regulatory effect. Functional analysis indicated vitamin D's role in the suppression of the inflammatory and adaptive immune response by down-regulating ten major histocompatibility complex class II genes, five alarmins of the S100 calcium binding protein A family and by affecting six chemokines of the C-X-C motif ligand family. Taken together, studying time-resolved responses allows to better contextualize the effects of vitamin D on the immune system.


Subject(s)
Adaptive Immunity/genetics , Gene Expression Profiling , Gene Expression Regulation , Inflammation Mediators/metabolism , Transcriptome , Vitamin D/metabolism , Computational Biology/methods , Gene Expression Profiling/methods , Gene Expression Regulation/drug effects , Humans , Inflammation/etiology , Inflammation/metabolism , Inflammation/pathology , Leukocytes, Mononuclear/drug effects , Leukocytes, Mononuclear/immunology , Leukocytes, Mononuclear/metabolism , Molecular Sequence Annotation , Vitamin D/analogs & derivatives , Vitamin D/pharmacology
3.
Viruses ; 13(12)2021 12 03.
Article in English | MEDLINE | ID: covidwho-1554806

ABSTRACT

SARS-CoV-2 genomic sequencing efforts have scaled dramatically to address the current global pandemic and aid public health. However, autonomous genome annotation of SARS-CoV-2 genes, proteins, and domains is not readily accomplished by existing methods and results in missing or incorrect sequences. To overcome this limitation, we developed a novel semi-supervised pipeline for automated gene, protein, and functional domain annotation of SARS-CoV-2 genomes that differentiates itself by not relying on the use of a single reference genome and by overcoming atypical genomic traits that challenge traditional bioinformatic methods. We analyzed an initial corpus of 66,000 SARS-CoV-2 genome sequences collected from labs across the world using our method and identified the comprehensive set of known proteins with 98.5% set membership accuracy and 99.1% accuracy in length prediction, compared to proteome references, including Replicase polyprotein 1ab (with its transcriptional slippage site). Compared to other published tools, such as Prokka (base) and VAPiD, we yielded a 6.4- and 1.8-fold increase in protein annotations. Our method generated 13,000,000 gene, protein, and domain sequences-some conserved across time and geography and others representing emerging variants. We observed 3362 non-redundant sequences per protein on average within this corpus and described key D614G and N501Y variants spatiotemporally in the initial genome corpus. For spike glycoprotein domains, we achieved greater than 97.9% sequence identity to references and characterized receptor binding domain variants. We further demonstrated the robustness and extensibility of our method on an additional 4000 variant diverse genomes containing all named variants of concern and interest as of August 2021. In this cohort, we successfully identified all keystone spike glycoprotein mutations in our predicted protein sequences with greater than 99% accuracy as well as demonstrating high accuracy of the protein and domain annotations. This work comprehensively presents the molecular targets to refine biomedical interventions for SARS-CoV-2 with a scalable, high-accuracy method to analyze newly sequenced infections as they arise.


Subject(s)
COVID-19/virology , Genome, Viral , Molecular Sequence Annotation , SARS-CoV-2/genetics , Amino Acid Sequence , Base Sequence , Computational Biology , Humans , Mutation , Protein Binding , Protein Domains , Spike Glycoprotein, Coronavirus/genetics
4.
Nucleic Acids Res ; 50(D1): D632-D639, 2022 01 07.
Article in English | MEDLINE | ID: covidwho-1506219

ABSTRACT

Network medicine has proven useful for dissecting genetic organization of complex human diseases. We have previously published HumanNet, an integrated network of human genes for disease studies. Since the release of the last version of HumanNet, many large-scale protein-protein interaction datasets have accumulated in public depositories. Additionally, the numbers of research papers and functional annotations for gene-phenotype associations have increased significantly. Therefore, updating HumanNet is a timely task for further improvement of network-based research into diseases. Here, we present HumanNet v3 (https://www.inetbio.org/humannet/, covering 99.8% of human protein coding genes) constructed by means of the expanded data with improved network inference algorithms. HumanNet v3 supports a three-tier model: HumanNet-PI (a protein-protein physical interaction network), HumanNet-FN (a functional gene network), and HumanNet-XC (a functional network extended by co-citation). Users can select a suitable tier of HumanNet for their study purpose. We showed that on disease gene predictions, HumanNet v3 outperforms both the previous HumanNet version and other integrated human gene networks. Furthermore, we demonstrated that HumanNet provides a feasible approach for selecting host genes likely to be associated with COVID-19.


Subject(s)
Algorithms , COVID-19/genetics , Communicable Diseases/genetics , Databases, Genetic , Gene Regulatory Networks , Software , COVID-19/virology , Communicable Diseases/classification , Gene Ontology , Humans , Internet , Molecular Sequence Annotation , Protein Interaction Mapping , SARS-CoV-2/pathogenicity
5.
Nucleic Acids Res ; 50(D1): D765-D770, 2022 01 07.
Article in English | MEDLINE | ID: covidwho-1462428

ABSTRACT

The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.


Subject(s)
COVID-19/virology , Databases, Genetic , SARS-CoV-2/genetics , Web Browser , Coronaviridae/genetics , Genetic Variation , Genome, Viral , Humans , Molecular Sequence Annotation
6.
Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1387963

ABSTRACT

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Subject(s)
COVID-19/prevention & control , Computational Biology/methods , Databases, Genetic , Genomics/methods , Molecular Sequence Annotation/methods , SARS-CoV-2/genetics , Animals , COVID-19/epidemiology , COVID-19/virology , Epidemics , Humans , Internet , Mice , Pseudogenes/genetics , RNA, Long Noncoding/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Transcription, Genetic/genetics
7.
Nucleic Acids Res ; 49(D1): D266-D273, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1387962

ABSTRACT

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.


Subject(s)
Computational Biology/statistics & numerical data , Databases, Protein/statistics & numerical data , Protein Domains , Proteins/chemistry , Amino Acid Sequence , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Computational Biology/methods , Epidemics , Humans , Internet , Molecular Sequence Annotation , Proteins/genetics , Proteins/metabolism , SARS-CoV-2/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid , Viral Proteins/chemistry , Viral Proteins/genetics , Viral Proteins/metabolism
8.
Nucleic Acids Res ; 49(D1): D92-D96, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1387961

ABSTRACT

GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 9.9 trillion base pairs from over 2.1 billion nucleotide sequences for 478 000 formally described species. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. Recent updates include new resources for data from the SARS-CoV-2 virus, updates to the NCBI Submission Portal and associated submission wizards for dengue and SARS-CoV-2 viruses, new taxonomy queries for viruses and prokaryotes, and simplified submission processes for EST and GSS sequences.


Subject(s)
Computational Biology/statistics & numerical data , Databases, Nucleic Acid , Genomics/methods , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods , Animals , COVID-19/epidemiology , COVID-19/virology , Computational Biology/methods , Humans , Information Storage and Retrieval/methods , Internet , Molecular Sequence Annotation/methods , Pandemics
9.
FEBS Open Bio ; 11(9): 2441-2452, 2021 09.
Article in English | MEDLINE | ID: covidwho-1380363

ABSTRACT

Whole genome and exome sequencing (WGS/WES) are the most popular next-generation sequencing (NGS) methodologies and are at present often used to detect rare and common genetic variants of clinical significance. We emphasize that automated sequence data processing, management, and visualization should be an indispensable component of modern WGS and WES data analysis for sequence assembly, variant detection (SNPs, SVs), imputation, and resolution of haplotypes. In this manuscript, we present a newly developed findable, accessible, interoperable, and reusable (FAIR) bioinformatics-genomics pipeline Java based Whole Genome/Exome Sequence Data Processing Pipeline (JWES) for efficient variant discovery and interpretation, and big data modeling and visualization. JWES is a cross-platform, user-friendly, product line application, that entails three modules: (a) data processing, (b) storage, and (c) visualization. The data processing module performs a series of different tasks for variant calling, the data storage module efficiently manages high-volume gene-variant data, and the data visualization module supports variant data interpretation with Circos graphs. The performance of JWES was tested and validated in-house with different experiments, using Microsoft Windows, macOS Big Sur, and UNIX operating systems. JWES is an open-source and freely available pipeline, allowing scientists to take full advantage of all the computing resources available, without requiring much computer science knowledge. We have successfully applied JWES for processing, management, and gene-variant discovery, annotation, prediction, and genotyping of WGS and WES data to analyze variable complex disorders. In summary, we report the performance of JWES with some reproducible case studies, using open access and in-house generated, high-quality datasets.


Subject(s)
Computational Biology/methods , Exome , Genome , Genomics/methods , Sequence Analysis, DNA/methods , Software , Data Management , Databases, Genetic , Genetic Variation , Humans , Molecular Sequence Annotation , Reproducibility of Results , Whole Exome Sequencing , Whole Genome Sequencing , Workflow
10.
Brief Bioinform ; 22(2): 845-854, 2021 03 22.
Article in English | MEDLINE | ID: covidwho-1343663

ABSTRACT

Humans have coexisted with pathogenic microorganisms throughout its history of evolution. We have never halted the exploration of pathogenic microorganisms. With the improvement of genome-sequencing technology and the continuous reduction of sequencing costs, an increasing number of complete genome sequences of pathogenic microorganisms have become available. Genome annotation of this massive sequence information has become a daunting task in biological research. This paper summarizes the approaches to the genome annotation of pathogenic microorganisms and the available popular genome annotation tools for prokaryotes, eukaryotes and viruses. Furthermore, real-world comparisons of different annotation tools using 12 genomes from prokaryotes, eukaryotes and viruses were conducted. Current challenges and problems were also discussed.


Subject(s)
Genome, Bacterial , Genome, Viral , Molecular Sequence Annotation , Virulence/genetics , Eukaryota/genetics , Humans
11.
Brief Bioinform ; 22(2): 1267-1278, 2021 03 22.
Article in English | MEDLINE | ID: covidwho-1343631

ABSTRACT

Accessory proteins play important roles in the interaction between coronaviruses and their hosts. Accordingly, a comprehensive study of the compositional diversity and evolutionary patterns of accessory proteins is critical to understanding the host adaptation and epidemic variation of coronaviruses. Here, we developed a standardized genome annotation tool for coronavirus (CoroAnnoter) by combining open reading frame prediction, transcription regulatory sequence recognition and homologous alignment. Using CoroAnnoter, we annotated 39 representative coronavirus strains to form a compositional profile for all of the accessary proteins. Large variations were observed in the number of accessory proteins of 1-10 for different coronaviruses, with SARS-CoV-2 and SARS-CoV having the most (9 and 10, respectively). The variation between SARS-CoV and SARS-CoV-2 accessory proteins could be traced back to related coronaviruses in other hosts. The genomic distribution of accessory proteins had significant intra-genus conservation and inter-genus diversity and could be grouped into 1, 4, 2 and 1 types for alpha-, beta-, gamma-, and delta-coronaviruses, respectively. Evolutionary analysis suggested that accessory proteins are more conservative locating before the N-terminal of proteins E and M (E-M), while they are more diverse after these proteins. Furthermore, comparison of virus-host interaction networks of SARS-CoV-2 and SARS-CoV accessory proteins showed that they share multiple antiviral signaling pathways, those involved in the apoptotic process, viral life cycle and response to oxidative stress. In summary, our study provides a tool for coronavirus genome annotation and builds a comprehensive profile for coronavirus accessory proteins covering their composition, classification, evolutionary pattern and host interaction.


Subject(s)
Biological Evolution , COVID-19/virology , SARS-CoV-2/metabolism , Viral Proteins/genetics , Viral Proteins/metabolism , Genes, Viral , Humans , Molecular Sequence Annotation , Open Reading Frames , Protein Interaction Maps , SARS-CoV-2/genetics
12.
J Cell Mol Med ; 25(16): 7825-7839, 2021 08.
Article in English | MEDLINE | ID: covidwho-1280337

ABSTRACT

The new coronavirus pandemic started in China in 2019. The intensity of the disease can range from mild to severe, leading to death in many cases. Despite extensive research in this area, the exact molecular nature of virus is not fully recognized; however, according to pieces of evidence, one of the mechanisms of virus pathogenesis is through the function of viral miRNAs. So, we hypothesized that SARS-CoV-2 pathogenesis may be due to targeting important genes in the host with its miRNAs, which involved in the respiratory system, immune pathways and vitamin D pathways, thus possibly contributing to disease progression and virus survival. Potential miRNA precursors and mature miRNA were predicted and confirmed based on the virus genome. The next step was to predict and identify their target genes and perform functional enrichment analysis to recognize the biological processes connected with these genes in the three pathways mentioned above through several comprehensive databases. Finally, cis-acting regulatory elements in 5' regulatory regions were analysed, and the analysis of available RNAseq data determined the expression level of genes. We revealed that thirty-nine mature miRNAs could theoretically derive from the SARS-CoV-2 genome. Functional enrichment analysis elucidated three highlighted pathways involved in SARS-CoV-2 pathogenesis: vitamin D, immune system and respiratory system. Our finding highlighted genes' involvement in three crucial molecular pathways and may help develop new therapeutic targets related to SARS-CoV-2.


Subject(s)
COVID-19/immunology , Host-Pathogen Interactions/physiology , MicroRNAs , SARS-CoV-2/genetics , Vitamin D/metabolism , COVID-19/genetics , COVID-19/virology , Gene Expression Regulation , Humans , Immune System/virology , Molecular Sequence Annotation , Promoter Regions, Genetic , RNA, Viral , Respiratory System/virology , SARS-CoV-2/pathogenicity
13.
Nat Genet ; 53(6): 809-816, 2021 06.
Article in English | MEDLINE | ID: covidwho-1223103

ABSTRACT

As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering in a new era of 'genomic contact tracing'-that is, using viral genomes to trace local transmission dynamics. However, because the viral phylogeny is already so large-and will undoubtedly grow many fold-placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach greatly improves the speed of phylogenetic placement of new samples and data visualization, making it possible to complete the placements under the constraints of real-time contact tracing. Thus, our method addresses an important need for maintaining a fully updated reference phylogeny. We make these tools available to the research community through the University of California Santa Cruz SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for SARS-CoV-2 specifically for laboratories worldwide.


Subject(s)
COVID-19/epidemiology , COVID-19/virology , Computational Biology/methods , Phylogeny , SARS-CoV-2/classification , SARS-CoV-2/genetics , Software , Algorithms , Computational Biology/standards , Databases, Genetic , Genome, Viral , Humans , Molecular Sequence Annotation , Mutation , Web Browser
14.
Nucleic Acids Res ; 49(D1): D589-D599, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1117395

ABSTRACT

PAGER-CoV (http://discovery.informatics.uab.edu/PAGER-CoV/) is a new web-based database that can help biomedical researchers interpret coronavirus-related functional genomic study results in the context of curated knowledge of host viral infection, inflammatory response, organ damage, and tissue repair. The new database consists of 11 835 PAGs (Pathways, Annotated gene-lists, or Gene signatures) from 33 public data sources. Through the web user interface, users can search by a query gene or a query term and retrieve significantly matched PAGs with all the curated information. Users can navigate from a PAG of interest to other related PAGs through either shared PAG-to-PAG co-membership relationships or PAG-to-PAG regulatory relationships, totaling 19 996 993. Users can also retrieve enriched PAGs from an input list of COVID-19 functional study result genes, customize the search data sources, and export all results for subsequent offline data analysis. In a case study, we performed a gene set enrichment analysis (GSEA) of a COVID-19 RNA-seq data set from the Gene Expression Omnibus database. Compared with the results using the standard PAGER database, PAGER-CoV allows for more sensitive matching of known immune-related gene signatures. We expect PAGER-CoV to be invaluable for biomedical researchers to find molecular biology mechanisms and tailored therapeutics to treat COVID-19 patients.


Subject(s)
Algorithms , COVID-19/prevention & control , Computational Biology/methods , Coronavirus/genetics , Databases, Genetic , SARS-CoV-2/genetics , COVID-19/epidemiology , COVID-19/virology , Coronavirus/metabolism , Data Curation/methods , Epidemics , Gene Regulatory Networks , Humans , Information Storage and Retrieval/methods , Internet , Molecular Sequence Annotation/methods , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , User-Computer Interface
15.
Mol Cell Biochem ; 476(5): 2203-2217, 2021 May.
Article in English | MEDLINE | ID: covidwho-1074462

ABSTRACT

Novel strain of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV2) causes mild to severe respiratory illness. The early symptoms may be fever, dry cough, sour throat, and difficulty in breathing which may lead to death in severe cases. Compared to previous outbreaks like SARS-CoV and Middle East Respiratory Syndrome (MERS), SARS-CoV2 disease (COVID-19) outbreak has been much distressing due to its high rate of infection but low infection fatality rate (IFR) with 1.4% around the world. World Health Organization (WHO) has declared (COVID-19) a pandemic on March 11, 2020. In the month of January 2020, the whole genome of SARS-CoV2 was sequenced which made work easy for researchers to develop diagnostic kits and to carry out drug repurposing to effectively alleviate the pandemic situation in the world. Now, it is important to understand why this virus has high rate of infectivity or is there any factor involved at the genome level which actually facilitates this virus infection globally? In this study, we have extensively analyzed the whole genomes of different coronaviruses infecting humans and animals in different geographical locations around the world. The main aim of the study is to identify the similarity and the mutational adaptation of the coronaviruses from different host and geographical locations to the SARS-CoV2 and provide a better strategy to understand the mutational rate for specific target-based drug designing. This study is focused to every annotation in a comparative manner which includes SNPs, repeat analysis with the different categorization of the short-sequence repeats and long-sequence repeats, different UTR's, transcriptional factors, and the predicted matured peptides with the specific length and positions on the genomes. The extensive analysis on SNPs revealed that Wuhan SARS-CoV2 and Indian SARS-CoV2 are having only eight SNPs. Collectively, phylogenetic analysis, repeat analysis, and the polymorphism revealed the genomic conserveness within the SARS-CoV2 and few other coronaviruses with very less mutational chances and the huge distance and mutations from the few other species.


Subject(s)
COVID-19/genetics , Genome, Viral , Middle East Respiratory Syndrome Coronavirus/genetics , Molecular Sequence Annotation , Phylogeny , RNA, Viral/genetics , SARS-CoV-2/genetics , COVID-19/diagnosis , Genome-Wide Association Study , Humans
16.
Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1048363

ABSTRACT

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.


Subject(s)
Databases, Protein , Proteins/chemistry , Amino Acid Sequence , COVID-19/metabolism , Internet , Molecular Sequence Annotation , Protein Domains , Protein Interaction Maps , SARS-CoV-2/metabolism , Sequence Alignment
17.
Viruses ; 13(1)2020 12 30.
Article in English | MEDLINE | ID: covidwho-1004764

ABSTRACT

In 2019, a novel coronavirus, SARS-CoV-2/nCoV-19, emerged in Wuhan, China, and has been responsible for the current COVID-19 pandemic. The evolutionary origins of the virus remain elusive and understanding its complex mutational signatures could guide vaccine design and development. As part of the international "CoronaHack" in April 2020, we employed a collection of contemporary methodologies to compare the genomic sequences of coronaviruses isolated from human (SARS-CoV-2; n = 163), bat (bat-CoV; n = 215) and pangolin (pangolin-CoV; n = 7) available in public repositories. We have also noted the pangolin-CoV isolate MP789 to bare stronger resemblance to SARS-CoV-2 than other pangolin-CoV. Following de novo gene annotation prediction, analyses of gene-gene similarity network, codon usage bias and variant discovery were undertaken. Strong host-associated divergences were noted in ORF3a, ORF6, ORF7a, ORF8 and S, and in codon usage bias profiles. Last, we have characterised several high impact variants (in-frame insertion/deletion or stop gain) in bat-CoV and pangolin-CoV populations, some of which are found in the same amino acid position and may be highlighting loci of potential functional relevance.


Subject(s)
Biodiversity , COVID-19/virology , Chiroptera/virology , Coronavirus/genetics , Pangolins/virology , SARS-CoV-2/genetics , Animals , Coronavirus/classification , Evolution, Molecular , Gene Regulatory Networks , Genome, Viral , Genomics , Host Specificity , Humans , Molecular Sequence Annotation , Phylogeny , Sequence Alignment
18.
Zool Res ; 41(6): 705-708, 2020 Nov 18.
Article in English | MEDLINE | ID: covidwho-982981

ABSTRACT

Since the first reported severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection in December 2019, coronavirus disease 2019 (COVID-19) has become a global pandemic, spreading to more than 200 countries and regions worldwide. With continued research progress and virus detection, SARS-CoV-2 genomes and sequencing data have been reported and accumulated at an unprecedented rate. To meet the need for fast analysis of these genome sequences, the National Genomics Data Center (NGDC) of the China National Center for Bioinformation (CNCB) has established an online coronavirus analysis platform, which includes de novoassembly, BLAST alignment, genome annotation, variant identification, and variant annotation modules. The online analysis platform can be freely accessed at the 2019 Novel Coronavirus Resource (2019nCoVR) (https://bigd.big.ac.cn/ncov/online/tools).


Subject(s)
Betacoronavirus/genetics , Computational Biology/methods , Coronavirus Infections/diagnosis , Genome, Viral/genetics , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Pneumonia, Viral/diagnosis , Animals , Betacoronavirus/classification , Betacoronavirus/physiology , COVID-19 , China , Computational Biology/organization & administration , Coronavirus Infections/virology , Genetic Variation , Humans , Internet , Molecular Sequence Annotation , Pandemics , Pneumonia, Viral/virology , SARS-CoV-2
19.
J Proteome Res ; 19(11): 4553-4566, 2020 11 06.
Article in English | MEDLINE | ID: covidwho-974862

ABSTRACT

While the COVID-19 pandemic is causing important loss of life, knowledge of the effects of the causative SARS-CoV-2 virus on human cells is currently limited. Investigating protein-protein interactions (PPIs) between viral and host proteins can provide a better understanding of the mechanisms exploited by the virus and enable the identification of potential drug targets. We therefore performed an in-depth computational analysis of the interactome of SARS-CoV-2 and human proteins in infected HEK 293 cells published by Gordon et al. (Nature 2020, 583, 459-468) to reveal processes that are potentially affected by the virus and putative protein binding sites. Specifically, we performed a set of network-based functional and sequence motif enrichment analyses on SARS-CoV-2-interacting human proteins and on PPI networks generated by supplementing viral-host PPIs with known interactions. Using a novel implementation of our GoNet algorithm, we identified 329 Gene Ontology terms for which the SARS-CoV-2-interacting human proteins are significantly clustered in PPI networks. Furthermore, we present a novel protein sequence motif discovery approach, LESMoN-Pro, that identified 9 amino acid motifs for which the associated proteins are clustered in PPI networks. Together, these results provide insights into the processes and sequence motifs that are putatively implicated in SARS-CoV-2 infection and could lead to potential therapeutic targets.


Subject(s)
Betacoronavirus , Coronavirus Infections , Host-Pathogen Interactions/genetics , Pandemics , Pneumonia, Viral , Protein Interaction Maps , Algorithms , Amino Acid Motifs , Betacoronavirus/chemistry , Betacoronavirus/metabolism , Betacoronavirus/pathogenicity , COVID-19 , Cluster Analysis , Coronavirus Infections/metabolism , Coronavirus Infections/virology , Gene Ontology , HEK293 Cells , Humans , Molecular Sequence Annotation , Pneumonia, Viral/metabolism , Pneumonia, Viral/virology , Protein Binding , Protein Interaction Maps/genetics , Protein Interaction Maps/physiology , Proteins/chemistry , Proteins/classification , Proteins/genetics , Proteins/metabolism , SARS-CoV-2 , Viral Proteins/chemistry , Viral Proteins/genetics , Viral Proteins/metabolism
20.
Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-955778

ABSTRACT

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Subject(s)
COVID-19/prevention & control , Computational Biology/methods , Databases, Genetic , Genomics/methods , Molecular Sequence Annotation/methods , SARS-CoV-2/genetics , Animals , COVID-19/epidemiology , COVID-19/virology , Epidemics , Humans , Internet , Mice , Pseudogenes/genetics , RNA, Long Noncoding/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Transcription, Genetic/genetics
SELECTION OF CITATIONS
SEARCH DETAIL