ABSTRACT
Almost twenty years after its initial release, the Eukaryotic Linear Motif (ELM) resource remains an invaluable source of information for the study of motif-mediated protein-protein interactions. ELM provides a comprehensive, regularly updated and well-organised repository of manually curated, experimentally validated short linear motifs (SLiMs). An increasing number of SLiM-mediated interactions are discovered each year and keeping the resource up-to-date continues to be a great challenge. In the current update, 30 novel motif classes have been added and five existing classes have undergone major revisions. The update includes 411 new motif instances mostly focused on cell-cycle regulation, control of the actin cytoskeleton, membrane remodelling and vesicle trafficking pathways, liquid-liquid phase separation and integrin signalling. Many of the newly annotated motif-mediated interactions are targets of pathogenic motif mimicry by viral, bacterial or eukaryotic pathogens, providing invaluable insights into the molecular mechanisms underlying infectious diseases. The current ELM release includes 317 motif classes incorporating 3934 individual motif instances manually curated from 3867 scientific publications. ELM is available at: http://elm.eu.org.
Subject(s)
Communicable Diseases/genetics , Databases, Protein , Host-Pathogen Interactions/genetics , Protein Interaction Domains and Motifs , Software , Actin Cytoskeleton/chemistry , Actin Cytoskeleton/metabolism , Animals , Binding Sites , Cell Cycle/genetics , Cell Membrane/chemistry , Cell Membrane/metabolism , Communicable Diseases/metabolism , Communicable Diseases/virology , Cyclins/chemistry , Cyclins/genetics , Cyclins/metabolism , Eukaryotic Cells/cytology , Eukaryotic Cells/metabolism , Eukaryotic Cells/virology , Gene Expression Regulation , Humans , Integrins/chemistry , Integrins/genetics , Integrins/metabolism , Mice , Molecular Sequence Annotation , Protein Binding , Rats , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Signal Transduction , Transport Vesicles/chemistry , Transport Vesicles/metabolism , Viruses/genetics , Viruses/metabolismABSTRACT
Viral infection involves a large number of protein-protein interactions (PPIs) between human and virus. The PPIs range from the initial binding of viral coat proteins to host membrane receptors to the hijacking of host transcription machinery. However, few interspecies PPIs have been identified, because experimental methods including mass spectrometry are time-consuming and expensive, and molecular dynamic simulation is limited only to the proteins whose 3D structures are solved. Sequence-based machine learning methods are expected to overcome these problems. We have first developed the LSTM model with word2vec to predict PPIs between human and virus, named LSTM-PHV, by using amino acid sequences alone. The LSTM-PHV effectively learnt the training data with a highly imbalanced ratio of positive to negative samples and achieved AUCs of 0.976 and 0.973 and accuracies of 0.984 and 0.985 on the training and independent datasets, respectively. In predicting PPIs between human and unknown or new virus, the LSTM-PHV learned greatly outperformed the existing state-of-the-art PPI predictors. Interestingly, learning of only sequence contexts as words is sufficient for PPI prediction. Use of uniform manifold approximation and projection demonstrated that the LSTM-PHV clearly distinguished the positive PPI samples from the negative ones. We presented the LSTM-PHV online web server and support data that are freely available at http://kurata35.bio.kyutech.ac.jp/LSTM-PHV.
Subject(s)
Computational Biology/methods , Host-Pathogen Interactions , Protein Interaction Mapping/methods , Software , Viral Proteins/metabolism , Virus Diseases/metabolism , Virus Diseases/virology , Algorithms , Amino Acid Sequence , Benchmarking , Databases, Protein , Deep Learning , Humans , Protein Interaction Domains and Motifs , Protein Interaction Maps , Reproducibility of Results , Web BrowserABSTRACT
The SARS-CoV-2 virus is the causative agent of the 2020 pandemic leading to the COVID-19 respiratory disease. With many scientific and humanitarian efforts ongoing to develop diagnostic tests, vaccines, and treatments for COVID-19, and to prevent the spread of SARS-CoV-2, mass spectrometry research, including proteomics, is playing a role in determining the biology of this viral infection. Proteomics studies are starting to lead to an understanding of the roles of viral and host proteins during SARS-CoV-2 infection, their protein-protein interactions, and post-translational modifications. This is beginning to provide insights into potential therapeutic targets or diagnostic strategies that can be used to reduce the long-term burden of the pandemic. However, the extraordinary situation caused by the global pandemic is also highlighting the need to improve mass spectrometry data and workflow sharing. We therefore describe freely available data and computational resources that can facilitate and assist the mass spectrometry-based analysis of SARS-CoV-2. We exemplify this by reanalyzing a virus-host interactome data set to detect protein-protein interactions and identify host proteins that could potentially be used as targets for drug repurposing.
Subject(s)
COVID-19/virology , Information Dissemination/methods , Mass Spectrometry/methods , SARS-CoV-2/chemistry , COVID-19/epidemiology , COVID-19 Testing/methods , COVID-19 Testing/statistics & numerical data , Computational Biology , Databases, Protein/statistics & numerical data , Drug Repositioning , Host Microbial Interactions/physiology , Humans , Mass Spectrometry/statistics & numerical data , Pandemics , Protein Interaction Domains and Motifs , Protein Interaction Maps , Protein Processing, Post-Translational , Proteomics/methods , Proteomics/statistics & numerical data , SARS-CoV-2/pathogenicity , SARS-CoV-2/physiology , Viral Proteins/chemistry , Viral Proteins/physiology , COVID-19 Drug TreatmentABSTRACT
The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.
Subject(s)
Databases, Protein , Proteins/chemistry , Amino Acid Sequence , COVID-19/metabolism , Internet , Molecular Sequence Annotation , Protein Domains , Protein Interaction Maps , SARS-CoV-2/metabolism , Sequence AlignmentABSTRACT
Viruses remain a major challenge in the fierce fight against diseases. There have been many pandemics caused by various viruses throughout the world over the years. Recently, the global outbreak of COVID-19 has had a catastrophic impact on human health and the world economy. Antiviral drug treatment has become another essential means to overcome pandemics in addition to vaccine development. How to quickly find effective drugs that can control the development of a pandemic is a hot issue that still needs to be resolved in medical research today. To accelerate the development of drugs, it is necessary to target the key target proteins in the development of the pandemic, screen active molecules, and develop reliable methods for the identification and characterization of target proteins based on the active ingredients of drugs. This article discusses key target proteins and their biological mechanisms in the progression of COVID-19 and other major epidemics. We propose a model based on these foundations, which includes identifying potential core targets, screening potential active molecules of core targets, and verifying active molecules. This article summarizes the related innovative technologies and methods. We hope to provide a reference for the screening of drugs related to pandemics and the development of new drugs.
Subject(s)
Drug Development/methods , Drug Evaluation, Preclinical/methods , Pandemics , Proteomics/methods , Acquired Immunodeficiency Syndrome/drug therapy , COVID-19 , Chemistry Techniques, Analytical , Coronavirus Infections/drug therapy , Databases, Protein , Humans , Plague/drug therapy , Pneumonia, Viral/drug therapyABSTRACT
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
Subject(s)
Computational Biology/methods , Data Curation/methods , Databases, Protein , Knowledge Bases , Proteome/metabolism , Proteomics/methods , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Humans , Internet , Molecular Sequence Annotation/methods , Pandemics , Proteome/genetics , SARS-CoV-2/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , User-Computer Interface , Viral Proteins/genetics , Viral Proteins/metabolismABSTRACT
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), the US data center for the global PDB archive and a founding member of the Worldwide Protein Data Bank partnership, serves tens of thousands of data depositors in the Americas and Oceania and makes 3D macromolecular structure data available at no charge and without restrictions to millions of RCSB.org users around the world, including >660 000 educators, students and members of the curious public using PDB101.RCSB.org. PDB data depositors include structural biologists using macromolecular crystallography, nuclear magnetic resonance spectroscopy, 3D electron microscopy and micro-electron diffraction. PDB data consumers accessing our web portals include researchers, educators and students studying fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. During the past 2 years, the research-focused RCSB PDB web portal (RCSB.org) has undergone a complete redesign, enabling improved searching with full Boolean operator logic and more facile access to PDB data integrated with >40 external biodata resources. New features and resources are described in detail using examples that showcase recently released structures of SARS-CoV-2 proteins and host cell proteins relevant to understanding and addressing the COVID-19 global pandemic.
Subject(s)
Computational Biology/methods , Databases, Protein , Macromolecular Substances/chemistry , Protein Conformation , Proteins/chemistry , Bioengineering/methods , Biomedical Research/methods , Biotechnology/methods , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Humans , Macromolecular Substances/metabolism , Pandemics , Proteins/genetics , Proteins/metabolism , SARS-CoV-2/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Software , Viral Proteins/chemistry , Viral Proteins/genetics , Viral Proteins/metabolismABSTRACT
The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.
Subject(s)
Computational Biology/statistics & numerical data , Databases, Protein , Proteins/metabolism , Proteome/metabolism , Animals , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Computational Biology/methods , Epidemics , Humans , Internet , Models, Molecular , Protein Structure, Tertiary , Proteins/chemistry , Proteins/genetics , Proteome/classification , Proteome/genetics , Repetitive Sequences, Amino Acid/genetics , SARS-CoV-2/genetics , SARS-CoV-2/physiology , Sequence Analysis, Protein/methodsABSTRACT
Interleukin 6 (IL-6) is a pro-inflammatory cytokine that stimulates acute phase responses, hematopoiesis and specific immune reactions. Recently, it was found that the IL-6 plays a vital role in the progression of COVID-19, which is responsible for the high mortality rate. In order to facilitate the scientific community to fight against COVID-19, we have developed a method for predicting IL-6 inducing peptides/epitopes. The models were trained and tested on experimentally validated 365 IL-6 inducing and 2991 non-inducing peptides extracted from the immune epitope database. Initially, 9149 features of each peptide were computed using Pfeature, which were reduced to 186 features using the SVC-L1 technique. These features were ranked based on their classification ability, and the top 10 features were used for developing prediction models. A wide range of machine learning techniques has been deployed to develop models. Random Forest-based model achieves a maximum AUROC of 0.84 and 0.83 on training and independent validation dataset, respectively. We have also identified IL-6 inducing peptides in different proteins of SARS-CoV-2, using our best models to design vaccine against COVID-19. A web server named as IL-6Pred and a standalone package has been developed for predicting, designing and screening of IL-6 inducing peptides (https://webs.iiitd.edu.in/raghava/il6pred/).
Subject(s)
COVID-19/physiopathology , Computer Simulation , Interleukin-6/biosynthesis , Peptides/metabolism , COVID-19/virology , Databases, Protein , Datasets as Topic , Humans , Interleukin-6/physiology , Machine Learning , SARS-CoV-2/isolation & purificationABSTRACT
SARS-CoV-2, the etiologic agent of COVID-19, exemplifies the general threat to global health posed by coronaviruses. The urgent need for effective vaccines and therapies is leading to a rapid rise in the number of high resolution structures of SARS-CoV-2 proteins that collectively reveal a map of virus vulnerabilities. To assist structure-based design of vaccines and therapeutics against SARS-CoV-2 and other coronaviruses, we have developed CoV3D, a database and resource for coronavirus protein structures, which is updated on a weekly basis. CoV3D provides users with comprehensive sets of structures of coronavirus proteins and their complexes with antibodies, receptors, and small molecules. Integrated molecular viewers allow users to visualize structures of the spike glycoprotein, which is the major target of neutralizing antibodies and vaccine design efforts, as well as sets of spike-antibody complexes, spike sequence variability, and known polymorphisms. In order to aid structure-based design and analysis of the spike glycoprotein, CoV3D permits visualization and download of spike structures with modeled N-glycosylation at known glycan sites, and contains structure-based classification of spike conformations, generated by unsupervised clustering. CoV3D can serve the research community as a centralized reference and resource for spike and other coronavirus protein structures, and is available at: https://cov3d.ibbr.umd.edu.
Subject(s)
Computational Biology , Coronavirus/metabolism , Databases, Protein , Spike Glycoprotein, Coronavirus/metabolism , Amino Acid Sequence , Antibodies, Neutralizing/immunology , Antibodies, Neutralizing/metabolism , Antibodies, Viral/immunology , Antibodies, Viral/metabolism , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Epidemics , Humans , Internet , Models, Molecular , Protein Structure, Tertiary , SARS-CoV-2/chemistry , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Spike Glycoprotein, Coronavirus/chemistry , Spike Glycoprotein, Coronavirus/geneticsABSTRACT
Two illustrations integrate current knowledge about severe acute respiratory syndrome (SARS) coronaviruses and their life cycle. They have been widely used in education and outreach through free distribution as part of a coronavirus-related resource at Protein Data Bank (PDB)-101, the education portal of the RCSB PDB. Scientific sources for creation of the illustrations and examples of dissemination and response are presented.
Subject(s)
Betacoronavirus/growth & development , Biomedical Research/education , Coronavirus Infections/prevention & control , Databases, Protein , Medicine in the Arts , Pandemics/prevention & control , Pneumonia, Viral/prevention & control , Animals , Betacoronavirus/physiology , Biomedical Research/methods , COVID-19 , Coronavirus Infections/epidemiology , Coronavirus Infections/virology , Data Display , Humans , Information Dissemination/methods , Life Cycle Stages , Pneumonia, Viral/epidemiology , Pneumonia, Viral/virology , Respiratory Mucosa/virology , SARS-CoV-2ABSTRACT
Deep learning is an important branch of artificial intelligence that has been successfully applied into medicine and two-dimensional ligand design. The three-dimensional (3D) ligand generation in the 3D pocket of protein target is an interesting and challenging issue for drug design by deep learning. Here, the MolAICal software is introduced to supply a way for generating 3D drugs in the 3D pocket of protein targets by combining with merits of deep learning model and classical algorithm. The MolAICal software mainly contains two modules for 3D drug design. In the first module of MolAICal, it employs the genetic algorithm, deep learning model trained by FDA-approved drug fragments and Vinardo score fitting on the basis of PDBbind database for drug design. In the second module, it uses deep learning generative model trained by drug-like molecules of ZINC database and molecular docking invoked by Autodock Vina automatically. Besides, the Lipinski's rule of five, Pan-assay interference compounds (PAINS), synthetic accessibility (SA) and other user-defined rules are introduced for filtering out unwanted ligands in MolAICal. To show the drug design modules of MolAICal, the membrane protein glucagon receptor and non-membrane protein SARS-CoV-2 main protease are chosen as the investigative drug targets. The results show MolAICal can generate the various and novel ligands with good binding scores and appropriate XLOGP values. We believe that MolAICal can use the advantages of deep learning model and classical programming for designing 3D drugs in protein pocket. MolAICal is freely for any nonprofit purpose and accessible at https://molaical.github.io.
Subject(s)
Algorithms , Artificial Intelligence , Drug Design , Proteins/chemistry , Software , Databases, Protein , Quantitative Structure-Activity RelationshipABSTRACT
Severe acute respiratory syndrome corona virus 2 (SARS-CoV-2), the cause of COVID-19 disease, has the potential to elicit autoimmunity because mimicry of human molecular chaperones by viral proteins. We compared viral proteins with human molecular chaperones, many of which are heat shock proteins, to determine if they share amino acid-sequence segments with immunogenic-antigenic potential, which can elicit cross-reactive antibodies and effector immune cells with the capacity to damage-destroy human cells by a mechanism of autoimmunity. We identified the chaperones that can putatively participate in molecular mimicry phenomena after SARS-CoV-2 infection, focusing on those for which endothelial cell plasma-cell membrane localization has already been demonstrated. We also postulate that post-translational modifications, induced by physical (shear) and chemical (metabolic) stress caused respectively by the risk factors hypertension and diabetes, might have a role in determining plasma-cell membrane localization and, in turn, autoimmune-induced endothelial damage.
Subject(s)
Betacoronavirus/metabolism , Coronavirus Infections/virology , Heat-Shock Proteins , Pneumonia, Viral/virology , Viral Proteins , Amino Acid Sequence , Autoantigens , Autoimmunity , COVID-19 , Databases, Protein , Endothelial Cells/metabolism , Heat-Shock Proteins/chemistry , Heat-Shock Proteins/immunology , Humans , Immunodominant Epitopes , Molecular Mimicry , Pandemics , SARS-CoV-2 , Viral Proteins/chemistry , Viral Proteins/immunologyABSTRACT
AIMS: Coronavirus disease 2019 (COVID-19), which is caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), is a major health concern worldwide. Due to the lack of specific medication and vaccination, drug-repurposing attempts has emerged as a promising approach and identified several human proteins interacting with the virus. This study aims to provide a comprehensive molecular profiling of the immune cell-enriched SARS-CoV-2 interacting protein USP13. MATERIALS AND METHODS: The list of immune cell-enriched proteins interacting with SARS-CoV-2 was retrieved from The Human Protein Atlas. Genomic alterations were identified using cBioPortal. Survival analysis was performed via Kaplan-Meier Plotter. Analyses of protein expression and tumor infiltration levels were carried out by TIMER. KEY FINDINGS: 14 human proteins that interact with SARS-CoV-2 were enriched in immune cells. Among these proteins, USP13 had the highest frequency of genomic alterations. Higher USP13 levels were correlated with improved survival in breast and lung cancers, while resulting in poor prognosis in ovarian and gastric cancers. Furthermore, copy number variations of USP13 significantly affected the infiltration levels of distinct subtypes of immune cells in head & neck, lung, ovarian and stomach cancers. Although our results suggested a tumor suppressor role for USP13 in lung cancer, in other cancers, its role seemed to be context-dependent. SIGNIFICANCE: It is critical to identify and characterize human proteins that interact with SARS-CoV-2 in order to have a better understanding of the disease and to develop better therapies/vaccines. Here, we provided a comprehensive molecular profiling the immune cell-enriched SARS-CoV-2 interacting protein USP13, which will be useful for future studies.
Subject(s)
Betacoronavirus/immunology , Coronavirus Infections/immunology , Endopeptidases/immunology , Leukocytes/immunology , Neoplasms/immunology , Pneumonia, Viral/immunology , COVID-19 , Coronavirus Infections/diagnosis , Coronavirus Infections/genetics , Coronavirus Infections/virology , DNA Copy Number Variations , Databases, Protein , Endopeptidases/genetics , Humans , Leukocytes/virology , Lymphocytes, Tumor-Infiltrating/immunology , Lymphocytes, Tumor-Infiltrating/virology , Neoplasms/diagnosis , Neoplasms/genetics , Neoplasms/virology , Pandemics , Pneumonia, Viral/diagnosis , Pneumonia, Viral/genetics , Pneumonia, Viral/virology , Prognosis , SARS-CoV-2 , Ubiquitin-Specific ProteasesABSTRACT
COVID-19, a disease caused by a new strain of coronavirus (SARS-CoV-2) originating from Wuhan, China, has now spread around the world, triggering a global pandemic, leaving the public eagerly awaiting the development of a specific medicine and vaccine. In response, aggressive efforts are underway around the world to overcome COVID-19. In this study, referencing the data published on the Protein Data Bank (PDB ID: 7BV2) on April 22, we conducted a detailed analysis of the interaction between the complex structures of the RNA-dependent RNA polymerase (RdRp) of SARS-CoV-2 and Remdesivir, an antiviral drug, from the quantum chemical perspective based on the fragment molecular orbital (FMO) method. In addition to the hydrogen bonding and intra-strand stacking between complementary strands as seen in normal base pairs, Remdesivir bound to the terminus of an primer-RNA strand was further stabilized by diagonal π-π stacking with the -1A' base of the complementary strand and an additional hydrogen bond with an intra-strand base, due to the effect of chemically modified functional group. Moreover, stable OH/π interaction is also formed with Thr687 of the RdRp. We quantitatively revealed the exhaustive interaction within the complex among Remdesivir, template-primer-RNA, RdRp and co-factors, and published the results in the FMODB database.
Subject(s)
Adenosine Monophosphate/analogs & derivatives , Alanine/analogs & derivatives , Antiviral Agents/chemistry , Betacoronavirus/chemistry , RNA, Viral/chemistry , RNA-Dependent RNA Polymerase/chemistry , Viral Proteins/chemistry , Adenosine Monophosphate/chemistry , Alanine/chemistry , Amino Acid Motifs , Betacoronavirus/enzymology , Binding Sites , Databases, Protein , Hydrogen Bonding , Molecular Docking Simulation , Nucleic Acid Conformation , Protein Binding , Protein Interaction Domains and Motifs , Protein Structure, Secondary , Quantum Theory , RNA, Viral/antagonists & inhibitors , RNA-Dependent RNA Polymerase/antagonists & inhibitors , SARS-CoV-2 , Thermodynamics , Viral Proteins/antagonists & inhibitorsABSTRACT
The outbreak of pneumonia caused by SARS-CoV-2 posed a great threat to global human health, which urgently requires us to understand comprehensively the mechanism of SARS-CoV-2 infection. Angiotensin-converting enzyme 2 (ACE2) was identified as a functional receptor for SARS-CoV-2, distribution of which may indicate the risk of different human organs vulnerable to SARS-CoV-2 infection. Previous studies investigating the distribution of ACE2 mRNA in human tissues only involved a limited size of the samples and a lack of determination for ACE2 protein. Given the heterogeneity among humans, the datasets covering more tissues with a larger size of samples should be analyzed. Indeed, ACE2 is a membrane and secreted protein, while the expression of ACE2 in blood and common blood cells remains unknown. Herein, the proteomic data in HIPED and the antibody-based immunochemistry result in HPA were collected to analyze the distribution of ACE2 protein in human tissues. The bulk RNA-seq profiles from three separate public datasets including HPA tissue Atlas, GTEx, and FANTOM5 CAGE were also obtained to determine the expression of ACE2 in human tissues. Moreover, the abundance of ACE2 in human blood and blood cells was determined by analyzing the data in the PeptideAtlas and the HPA Blood Atlas. We found that the mRNA expression cannot reflect the abundance of ACE2 factor due to the strong differences between mRNA and protein quantities of ACE2 within and across tissues. Our results suggested that ACE2 protein is mainly expressed in the small intestine, kidney, gallbladder, and testis, while the abundance of which in brain-associated tissues and blood common cells is low. HIPED revealed enrichment of ACE2 protein in the placenta and ovary despite a low mRNA level. Further, human secretome shows that the average concentration of ACE2 protein in the plasma of males is higher than those in females. Our research will be beneficial for understanding the transmission routes and sex-based differences in susceptibility of SARS-CoV-2 infection.
Subject(s)
Coronavirus Infections/metabolism , Peptidyl-Dipeptidase A/metabolism , Pneumonia, Viral/metabolism , Receptors, Virus/metabolism , Angiotensin-Converting Enzyme 2 , Betacoronavirus , COVID-19 , Databases, Protein , Female , Humans , Immunohistochemistry , Male , Mass Spectrometry , Pandemics , Proteomics , RNA, Messenger/metabolism , RNA-Seq , SARS-CoV-2 , Tissue Distribution , TranscriptomeABSTRACT
The pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has challenged the speed at which laboratories can discover the viral composition and study health outcomes. The small â¼30-kb ssRNA genome of coronaviruses makes them adept at cross-species spread while enabling a robust understanding of all of the proteins the viral genome encodes. We have employed protein modeling, molecular dynamics simulations, evolutionary mapping, and 3D printing to gain a full proteome- and dynamicome-level understanding of SARS-CoV-2. We established the Viral Integrated Structural Evolution Dynamic Database (VIStEDD at RRID:SCR_018793) to facilitate future discoveries and educational use. Here, we highlight the use of VIStEDD for nsp6, nucleocapsid (N), and spike (S) surface glycoprotein. For both nsp6 and N, we found highly conserved surface amino acids that likely drive protein-protein interactions. In characterizing viral S protein, we developed a quantitative dynamics cross-correlation matrix to gain insights into its interactions with the angiotensin I-converting enzyme 2 (ACE2)-solute carrier family 6 member 19 (SLC6A19) dimer. Using this quantitative matrix, we elucidated 47 potential functional missense variants from genomic databases within ACE2/SLC6A19/transmembrane serine protease 2 (TMPRSS2), warranting genomic enrichment analyses in SARS-CoV-2 patients. These variants had ultralow frequency but existed in males hemizygous for ACE2. Two ACE2 noncoding variants (rs4646118 and rs143185769) present in â¼9% of individuals of African descent may regulate ACE2 expression and may be associated with increased susceptibility of African Americans to SARS-CoV-2. We propose that this SARS-CoV-2 database may aid research into the ongoing pandemic.