Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 65
Filter
1.
Nucleic Acids Res ; 50(D1): D1-D10, 2022 01 07.
Article in English | MEDLINE | ID: covidwho-1607482

ABSTRACT

The 2022 Nucleic Acids Research Database Issue contains 185 papers, including 87 papers reporting on new databases and 85 updates from resources previously published in the Issue. Thirteen additional manuscripts provide updates on databases most recently published elsewhere. Seven new databases focus specifically on COVID-19 and SARS-CoV-2, including SCoV2-MD, the first of the Issue's Breakthrough Articles. Major nucleic acid databases reporting updates include MODOMICS, JASPAR and miRTarBase. The AlphaFold Protein Structure Database, described in the second Breakthrough Article, is the stand-out in the protein section, where the Human Proteoform Atlas and GproteinDb are other notable new arrivals. Updates from DisProt, FuzDB and ELM comprehensively cover disordered proteins. Under the metabolism and signalling section Reactome, ConsensusPathDB, HMDB and CAZy are major returning resources. In microbial and viral genomes taxonomy and systematics are well covered by LPSN, TYGS and GTDB. Genomics resources include Ensembl, Ensembl Genomes and UCSC Genome Browser. Major returning pharmacology resource names include the IUPHAR/BPS guide and the Therapeutic Target Database. New plant databases include PlantGSAD for gene lists and qPTMplants for post-translational modifications. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). Our latest update to the NAR online Molecular Biology Database Collection brings the total number of entries to 1645. Following last year's major cleanup, we have updated 317 entries, listing 89 new resources and trimming 80 discontinued URLs. The current release is available at http://www.oxfordjournals.org/nar/database/c/.


Subject(s)
Databases, Factual , Molecular Biology , Animals , COVID-19 , Databases, Nucleic Acid , Databases, Protein , Genome, Microbial , Genome, Viral , Humans , Mice , Plants/genetics , Protein Processing, Post-Translational , Proteome , SARS-CoV-2/genetics , Signal Transduction
2.
ACS Synth Biol ; 10(11): 3209-3235, 2021 11 19.
Article in English | MEDLINE | ID: covidwho-1504658

ABSTRACT

SARS-CoV-2 triggered a worldwide pandemic disease, COVID-19, for which an effective treatment has not yet been settled. Among the most promising targets to fight this disease is SARS-CoV-2 main protease (Mpro), which has been extensively studied in the last few months. There is an urgency for developing effective computational protocols that can help us tackle these key viral proteins. Hence, we have put together a robust and thorough pipeline of in silico protein-ligand characterization methods to address one of the biggest biological problems currently plaguing our world. These methodologies were used to characterize the interaction of SARS-CoV-2 Mpro with an α-ketoamide inhibitor and include details on how to upload, visualize, and manage the three-dimensional structure of the complex and acquire high-quality figures for scientific publications using PyMOL (Protocol 1); perform homology modeling with MODELLER (Protocol 2); perform protein-ligand docking calculations using HADDOCK (Protocol 3); run a virtual screening protocol of a small compound database of SARS-CoV-2 candidate inhibitors with AutoDock 4 and AutoDock Vina (Protocol 4); and, finally, sample the conformational space at the atomic level between SARS-CoV-2 Mpro and the α-ketoamide inhibitor with Molecular Dynamics simulations using GROMACS (Protocol 5). Guidelines for careful data analysis and interpretation are also provided for each Protocol.


Subject(s)
Antiviral Agents/chemistry , COVID-19/drug therapy , Databases, Protein , Molecular Docking Simulation , Molecular Dynamics Simulation , SARS-CoV-2/chemistry , Viral Proteins/chemistry , Antiviral Agents/therapeutic use , Humans , Ligands
3.
J Nat Prod ; 84(11): 3001-3007, 2021 11 26.
Article in English | MEDLINE | ID: covidwho-1483081

ABSTRACT

The pressing need for SARS-CoV-2 controls has led to a reassessment of strategies to identify and develop natural product inhibitors of zoonotic, highly virulent, and rapidly emerging viruses. This review article addresses how contemporary approaches involving computational chemistry, natural product (NP) and protein databases, and mass spectrometry (MS) derived target-ligand interaction analysis can be utilized to expedite the interrogation of NP structures while minimizing the time and expense of extraction, purification, and screening in BioSafety Laboratories (BSL)3 laboratories. The unparalleled structural diversity and complexity of NPs is an extraordinary resource for the discovery and development of broad-spectrum inhibitors of viral genera, including Betacoronavirus, which contains MERS, SARS, SARS-CoV-2, and the common cold. There are two key technological advances that have created unique opportunities for the identification of NP prototypes with greater efficiency: (1) the application of structural databases for NPs and target proteins and (2) the application of modern MS techniques to assess protein-ligand interactions directly from NP extracts. These approaches, developed over years, now allow for the identification and isolation of unique antiviral ligands without the immediate need for BSL3 facilities. Overall, the goal is to improve the success rate of NP-based screening by focusing resources on source materials with a higher likelihood of success, while simultaneously providing opportunities for the discovery of novel ligands to selectively target proteins involved in viral infection.


Subject(s)
Antiviral Agents/pharmacology , Betacoronavirus/drug effects , Biological Products/pharmacology , Drug Discovery , Computational Biology , Databases, Chemical , Databases, Protein , Ligands , Mass Spectrometry , Protein Interaction Mapping , SARS-CoV-2/drug effects
4.
Int J Biol Macromol ; 147: 513-520, 2020 Mar 15.
Article in English | MEDLINE | ID: covidwho-1454163

ABSTRACT

The alternative splicing is a mechanism increasing the number of expressed proteins and a variety of these functions. We uncovered the protein domains most frequently lacked or occurred in the splice variants. Proteins presented by several isoforms participate in such processes as transcription regulation, immune response, etc. Our results displayed the association of alternative splicing with branched regulatory pathways. By considering the published data on the protein proteins encoded by the 18th human chromosome, we noted that alternative products display the differences in several functional features, such as phosphorylation, subcellular location, ligand specificity, protein-protein interactions, etc. The investigation of alternative variants referred to the protein kinase domain was performed by comparing the alternative sequences with 3D structures. It was shown that large enough insertions/deletions could be compatible with the kinase fold if they match between the conserved secondary structures. Using the 3D data on human proteins, we showed that conformational flexibility could accommodate fold alterations in splice variants. The investigations of structural and functional differences in splice isoforms are required to understand how to distinguish the isoforms expressed as functioning proteins from the non-realized transcripts. These studies allow filling the gap between genomic and proteomic data.


Subject(s)
Alternative Splicing , Chromosomes, Human, Pair 18 , Databases, Protein , RNA-Binding Proteins , Chromosomes, Human, Pair 18/genetics , Chromosomes, Human, Pair 18/metabolism , Humans , Protein Structure, Secondary , Proteomics , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism
5.
Nat Methods ; 18(10): 1181-1191, 2021 10.
Article in English | MEDLINE | ID: covidwho-1447314

ABSTRACT

Cytokines are critical for intercellular communication in human health and disease, but the investigation of cytokine signaling activity has remained challenging due to the short half-lives of cytokines and the complexity/redundancy of cytokine functions. To address these challenges, we developed the Cytokine Signaling Analyzer (CytoSig; https://cytosig.ccr.cancer.gov/ ), providing both a database of target genes modulated by cytokines and a predictive model of cytokine signaling cascades from transcriptomic profiles. We collected 20,591 transcriptome profiles for human cytokine, chemokine and growth factor responses. This atlas of transcriptional patterns induced by cytokines enabled the reliable prediction of signaling activities in distinct cell populations in infectious diseases, chronic inflammation and cancer using bulk and single-cell transcriptomic data. CytoSig revealed previously unidentified roles of many cytokines, such as BMP6 as an anti-inflammatory factor, and identified candidate therapeutic targets in human inflammatory diseases, such as CXCL8 for severe coronavirus disease 2019.


Subject(s)
COVID-19/immunology , Cytokines/metabolism , Databases, Protein , SARS-CoV-2 , COVID-19/metabolism , Cytokines/genetics , Gene Expression Regulation/immunology , Gene Expression Regulation/physiology , Humans , Signal Transduction/physiology
6.
ChemMedChem ; 16(13): 2075-2081, 2021 07 06.
Article in English | MEDLINE | ID: covidwho-1384144

ABSTRACT

Computational approaches supporting the early characterization of fragment molecular recognition mechanism represent a valuable complement to more expansive and low-throughput experimental techniques. In this retrospective study, we have investigated the geometric accuracy with which high-throughput supervised molecular dynamics simulations (HT-SuMD) can anticipate the experimental bound state for a set of 23 fragments targeting the SARS-CoV-2 main protease. Despite the encouraging results herein reported, in line with those previously described for other MD-based posing approaches, a high number of incorrect binding modes still complicate HT-SuMD routine application. To overcome this limitation, fragment pose stability has been investigated and integrated as part of our in-silico pipeline, allowing us to prioritize only the more reliable predictions.


Subject(s)
Molecular Dynamics Simulation , Protease Inhibitors/chemistry , SARS-CoV-2/metabolism , Viral Matrix Proteins/chemistry , Binding Sites , COVID-19/pathology , COVID-19/virology , Databases, Protein , Humans , Ligands , Protease Inhibitors/metabolism , Retrospective Studies , SARS-CoV-2/isolation & purification , Viral Matrix Proteins/metabolism
7.
BMC Bioinformatics ; 22(1): 1, 2021 Jan 02.
Article in English | MEDLINE | ID: covidwho-1388726

ABSTRACT

BACKGROUND: Protein-peptide interactions play a fundamental role in a wide variety of biological processes, such as cell signaling, regulatory networks, immune responses, and enzyme inhibition. Peptides are characterized by low toxicity and small interface areas; therefore, they are good targets for therapeutic strategies, rational drug planning and protein inhibition. Approximately 10% of the ethical pharmaceutical market is protein/peptide-based. Furthermore, it is estimated that 40% of protein interactions are mediated by peptides. Despite the fast increase in the volume of biological data, particularly on sequences and structures, there remains a lack of broad and comprehensive protein-peptide databases and tools that allow the retrieval, characterization and understanding of protein-peptide recognition and consequently support peptide design. RESULTS: We introduce Propedia, a comprehensive and up-to-date database with a web interface that permits clustering, searching and visualizing of protein-peptide complexes according to varied criteria. Propedia comprises over 19,000 high-resolution structures from the Protein Data Bank including structural and sequence information from protein-peptide complexes. The main advantage of Propedia over other peptide databases is that it allows a more comprehensive analysis of similarity and redundancy. It was constructed based on a hybrid clustering algorithm that compares and groups peptides by sequences, interface structures and binding sites. Propedia is available through a graphical, user-friendly and functional interface where users can retrieve, and analyze complexes and download each search data set. We performed case studies and verified that the utility of Propedia scores to rank promissing interacting peptides. In a study involving predicting peptides to inhibit SARS-CoV-2 main protease, we showed that Propedia scores related to similarity between different peptide complexes with SARS-CoV-2 main protease are in agreement with molecular dynamics free energy calculation. CONCLUSIONS: Propedia is a database and tool to support structure-based rational design of peptides for special purposes. Protein-peptide interactions can be useful to predict, classifying and scoring complexes or for designing new molecules as well. Propedia is up-to-date as a ready-to-use webserver with a friendly and resourceful interface and is available at: https://bioinfo.dcc.ufmg.br/propedia.


Subject(s)
Database Management Systems , Databases, Protein , Peptides/chemistry , Proteins/chemistry , Algorithms , Humans
8.
Nucleic Acids Res ; 49(D1): D266-D273, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1387962

ABSTRACT

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.


Subject(s)
Computational Biology/statistics & numerical data , Databases, Protein/statistics & numerical data , Protein Domains , Proteins/chemistry , Amino Acid Sequence , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Computational Biology/methods , Epidemics , Humans , Internet , Molecular Sequence Annotation , Proteins/genetics , Proteins/metabolism , SARS-CoV-2/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid , Viral Proteins/chemistry , Viral Proteins/genetics , Viral Proteins/metabolism
9.
Nucleic Acids Res ; 49(D1): D261-D265, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1387959

ABSTRACT

ADP-ribosylation is a protein modification responsible for biological processes such as DNA repair, RNA regulation, cell cycle and biomolecular condensate formation. Dysregulation of ADP-ribosylation is implicated in cancer, neurodegeneration and viral infection. We developed ADPriboDB (adpribodb.leunglab.org) to facilitate studies in uncovering insights into the mechanisms and biological significance of ADP-ribosylation. ADPriboDB 2.0 serves as a one-stop repository comprising 48 346 entries and 9097 ADP-ribosylated proteins, of which 6708 were newly identified since the original database release. In this updated version, we provide information regarding the sites of ADP-ribosylation in 32 946 entries. The wealth of information allows us to interrogate existing databases or newly available data. For example, we found that ADP-ribosylated substrates are significantly associated with the recently identified human protein interaction networks associated with SARS-CoV-2, which encodes a conserved protein domain called macrodomain that binds and removes ADP-ribosylation. In addition, we create a new interactive tool to visualize the local context of ADP-ribosylation, such as structural and functional features as well as other post-translational modifications (e.g. phosphorylation, methylation and ubiquitination). This information provides opportunities to explore the biology of ADP-ribosylation and generate new hypotheses for experimental testing.


Subject(s)
Adenosine Diphosphate Ribose/metabolism , Computational Biology/statistics & numerical data , Databases, Protein/statistics & numerical data , Proteins/metabolism , ADP-Ribosylation , Binding Sites , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Computational Biology/methods , Humans , Protein Domains , Protein Processing, Post-Translational , Proteins/chemistry , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Viral Proteins/chemistry , Viral Proteins/metabolism
10.
Nat Commun ; 12(1): 3201, 2021 05 27.
Article in English | MEDLINE | ID: covidwho-1387343

ABSTRACT

Fragment-based drug design has introduced a bottom-up process for drug development, with improved sampling of chemical space and increased effectiveness in early drug discovery. Here, we combine the use of pharmacophores, the most general concept of representing drug-target interactions with the theory of protein hotspots, to develop a design protocol for fragment libraries. The SpotXplorer approach compiles small fragment libraries that maximize the coverage of experimentally confirmed binding pharmacophores at the most preferred hotspots. The efficiency of this approach is demonstrated with a pilot library of 96 fragment-sized compounds (SpotXplorer0) that is validated on popular target classes and emerging drug targets. Biochemical screening against a set of GPCRs and proteases retrieves compounds containing an average of 70% of known pharmacophores for these targets. More importantly, SpotXplorer0 screening identifies confirmed hits against recently established challenging targets such as the histone methyltransferase SETD2, the main protease (3CLPro) and the NSP3 macrodomain of SARS-CoV-2.


Subject(s)
Coronavirus 3C Proteases/chemistry , Coronavirus Papain-Like Proteases/chemistry , Drug Development/methods , Drug Discovery/methods , High-Throughput Screening Assays/methods , Histone-Lysine N-Methyltransferase/chemistry , Animals , Cell Survival , Chlorocebus aethiops , Computational Chemistry , Crystallography, X-Ray , Databases, Protein , Drug Design , HEK293 Cells , Humans , Hydrogen Bonding , Hydrophobic and Hydrophilic Interactions , Ligands , Protein Binding , Receptors, G-Protein-Coupled/chemistry , SARS-CoV-2/chemistry , SARS-CoV-2/genetics , Small Molecule Libraries , Vero Cells
11.
Molecules ; 26(16)2021 Aug 06.
Article in English | MEDLINE | ID: covidwho-1362397

ABSTRACT

Protein glycosylation that mediates interactions among viral proteins, host receptors, and immune molecules is an important consideration for predicting viral antigenicity. Viral spike proteins, the proteins responsible for host cell invasion, are especially important to be examined. However, there is a lack of consensus within the field of glycoproteomics regarding identification strategy and false discovery rate (FDR) calculation that impedes our examinations. As a case study in the overlap between software, here as a case study, we examine recently published SARS-CoV-2 glycoprotein datasets with four glycoproteomics identification software with their recommended protocols: GlycReSoft, Byonic, pGlyco2, and MSFragger-Glyco. These software use different Target-Decoy Analysis (TDA) forms to estimate FDR and have different database-oriented search methods with varying degrees of quantification capabilities. Instead of an ideal overlap between software, we observed different sets of identifications with the intersection. When clustering by glycopeptide identifications, we see higher degrees of relatedness within software than within glycosites. Taking the consensus between results yields a conservative and non-informative conclusion as we lose identifications in the desire for caution; these non-consensus identifications are often lower abundance and, therefore, more susceptible to nuanced changes. We conclude that present glycoproteomics softwares are not directly comparable, and that methods are needed to assess their overall results and FDR estimation performance. Once such tools are developed, it will be possible to improve FDR methods and quantify complex glycoproteomes with acceptable confidence, rather than potentially misleading broad strokes.


Subject(s)
Algorithms , Glycopeptides/analysis , Glycoproteins/analysis , COVID-19/metabolism , Databases, Protein , Glycopeptides/chemistry , Glycoproteins/chemistry , Glycosylation , Humans , Proteomics/methods , Proteomics/standards , SARS-CoV-2/metabolism , Software , Spike Glycoprotein, Coronavirus/analysis , Spike Glycoprotein, Coronavirus/chemistry , Tandem Mass Spectrometry/methods , Viral Fusion Proteins/analysis , Viral Fusion Proteins/chemistry
12.
Brief Bioinform ; 22(2): 936-945, 2021 03 22.
Article in English | MEDLINE | ID: covidwho-1352108

ABSTRACT

Interleukin 6 (IL-6) is a pro-inflammatory cytokine that stimulates acute phase responses, hematopoiesis and specific immune reactions. Recently, it was found that the IL-6 plays a vital role in the progression of COVID-19, which is responsible for the high mortality rate. In order to facilitate the scientific community to fight against COVID-19, we have developed a method for predicting IL-6 inducing peptides/epitopes. The models were trained and tested on experimentally validated 365 IL-6 inducing and 2991 non-inducing peptides extracted from the immune epitope database. Initially, 9149 features of each peptide were computed using Pfeature, which were reduced to 186 features using the SVC-L1 technique. These features were ranked based on their classification ability, and the top 10 features were used for developing prediction models. A wide range of machine learning techniques has been deployed to develop models. Random Forest-based model achieves a maximum AUROC of 0.84 and 0.83 on training and independent validation dataset, respectively. We have also identified IL-6 inducing peptides in different proteins of SARS-CoV-2, using our best models to design vaccine against COVID-19. A web server named as IL-6Pred and a standalone package has been developed for predicting, designing and screening of IL-6 inducing peptides (https://webs.iiitd.edu.in/raghava/il6pred/).


Subject(s)
COVID-19/physiopathology , Computer Simulation , Interleukin-6/biosynthesis , Peptides/metabolism , COVID-19/virology , Databases, Protein , Datasets as Topic , Humans , Interleukin-6/physiology , Machine Learning , SARS-CoV-2/isolation & purification
13.
Brief Bioinform ; 22(2): 832-844, 2021 03 22.
Article in English | MEDLINE | ID: covidwho-1343659

ABSTRACT

While leading to millions of people's deaths every year the treatment of viral infectious diseases remains a huge public health challenge.Therefore, an in-depth understanding of human-virus protein-protein interactions (PPIs) as the molecular interface between a virus and its host cell is of paramount importance to obtain new insights into the pathogenesis of viral infections and development of antiviral therapeutic treatments. However, current human-virus PPI database resources are incomplete, lack annotation and usually do not provide the opportunity to computationally predict human-virus PPIs. Here, we present the Human-Virus Interaction DataBase (HVIDB, http://zzdlab.com/hvidb/) that provides comprehensively annotated human-virus PPI data as well as seamlessly integrates online PPI prediction tools. Currently, HVIDB highlights 48 643 experimentally verified human-virus PPIs covering 35 virus families, 6633 virally targeted host complexes, 3572 host dependency/restriction factors as well as 911 experimentally verified/predicted 3D complex structures of human-virus PPIs. Furthermore, our database resource provides tissue-specific expression profiles of 6790 human genes that are targeted by viruses and 129 Gene Expression Omnibus series of differentially expressed genes post-viral infections. Based on these multifaceted and annotated data, our database allows the users to easily obtain reliable information about PPIs of various human viruses and conduct an in-depth analysis of their inherent biological significance. In particular, HVIDB also integrates well-performing machine learning models to predict interactions between the human host and viral proteins that are based on (i) sequence embedding techniques, (ii) interolog mapping and (iii) domain-domain interaction inference. We anticipate that HVIDB will serve as a one-stop knowledge base to further guide hypothesis-driven experimental efforts to investigate human-virus relationships.


Subject(s)
Databases, Protein , Protein Interaction Mapping/methods , Proteins/metabolism , Viral Proteins/metabolism , Gene Expression Profiling , Humans , Machine Learning , Protein Array Analysis , Protein Conformation , Proteins/chemistry , Proteins/genetics , Viral Proteins/chemistry , Viral Proteins/genetics
14.
Brief Bioinform ; 22(2): 1053-1064, 2021 03 22.
Article in English | MEDLINE | ID: covidwho-1343657

ABSTRACT

Discovering efficient drugs and identifying target proteins are still an unmet but urgent need for curing coronavirus disease 2019 (COVID-19). Protein structure-based docking is a widely applied approach for discovering active compounds against drug targets and for predicting potential targets of active compounds. However, this approach has its inherent deficiency caused by e.g. various different conformations with largely varied binding pockets adopted by proteins, or the lack of true target proteins in the database. This deficiency may result in false negative results. As a complementary approach to the protein structure-based platform for COVID-19, termed as D3Docking in our previous work, we developed in this study a ligand-based method, named D3Similarity, which is based on the molecular similarity evaluation between the submitted molecule(s) and those in an active compound database. The database is constituted by all the reported bioactive molecules against the coronaviruses, viz., severe acute respiratory syndrome coronavirus (SARS), Middle East respiratory syndrome coronavirus (MERS), severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), human betacoronavirus 2c EMC/2012 (HCoV-EMC), human CoV 229E (HCoV-229E) and feline infectious peritonitis virus (FIPV), some of which have target or mechanism information but some do not. Based on the two-dimensional (2D) and three-dimensional (3D) similarity evaluation of molecular structures, virtual screening and target prediction could be performed according to similarity ranking results. With two examples, we demonstrated the reliability and efficiency of D3Similarity by using 2D × 3D value as score for drug discovery and target prediction against COVID-19. The database, which will be updated regularly, is available free of charge at https://www.d3pharma.com/D3Targets-2019-nCoV/D3Similarity/index.php.


Subject(s)
COVID-19/drug therapy , Viral Proteins/metabolism , Antiviral Agents/pharmacology , Antiviral Agents/therapeutic use , Databases, Protein , Ligands , Reproducibility of Results , SARS-CoV-2/drug effects , SARS-CoV-2/isolation & purification
15.
Brief Bioinform ; 22(2): 769-780, 2021 03 22.
Article in English | MEDLINE | ID: covidwho-1343650

ABSTRACT

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a rapidly growing infectious disease, widely spread with high mortality rates. Since the release of the SARS-CoV-2 genome sequence in March 2020, there has been an international focus on developing target-based drug discovery, which also requires knowledge of the 3D structure of the proteome. Where there are no experimentally solved structures, our group has created 3D models with coverage of 97.5% and characterized them using state-of-the-art computational approaches. Models of protomers and oligomers, together with predictions of substrate and allosteric binding sites, protein-ligand docking, SARS-CoV-2 protein interactions with human proteins, impacts of mutations, and mapped solved experimental structures are freely available for download. These are implemented in SARS CoV-2 3D, a comprehensive and user-friendly database, available at https://sars3d.com/. This provides essential information for drug discovery, both to evaluate targets and design new potential therapeutics.


Subject(s)
Antiviral Agents/pharmacology , COVID-19/virology , Databases, Protein , Drug Delivery Systems , Proteome , SARS-CoV-2/drug effects , Humans , SARS-CoV-2/isolation & purification
16.
Acta Crystallogr D Struct Biol ; 77(Pt 8): 1040-1049, 2021 Aug 01.
Article in English | MEDLINE | ID: covidwho-1341166

ABSTRACT

The ß-link is a composite protein motif consisting of a G1ß ß-bulge and a type II ß-turn, and is generally found at the end of two adjacent strands of antiparallel ß-sheet. The 1,2-positions of the ß-bulge are also the 3,4-positions of the ß-turn, with the result that the N-terminal portion of the polypeptide chain is orientated at right angles to the ß-sheet. Here, it is reported that the ß-link is frequently found in certain protein folds of the SCOPe structural classification at specific locations where it connects a ß-sheet to another area of a protein. It is found at locations where it connects one ß-sheet to another in the ß-sandwich and related structures, and in small (four-, five- or six-stranded) ß-barrels, where it connects two ß-strands through the polypeptide chain that crosses an open end of the barrel. It is not found in larger (eight-stranded or more) ß-barrels that are straightforward ß-meanders. In some cases it initiates a connection between a single ß-sheet and an α-helix. The ß-link also provides a framework for catalysis in serine proteases, where the catalytic serine is part of a conserved ß-link, and in cysteine proteases, including Mpro of human SARS-CoV-2, in which two residues of the active site are located in a conserved ß-link.


Subject(s)
Protein Structure, Secondary , Serine Proteases/chemistry , Amino Acid Motifs , Animals , Catalytic Domain , Coronavirus 3C Proteases/chemistry , Coronavirus 3C Proteases/metabolism , Cysteine Proteases/chemistry , Cysteine Proteases/metabolism , Databases, Protein , Humans , Hydrogen Bonding , Models, Molecular , SARS-CoV-2/chemistry , SARS-CoV-2/enzymology , Serine Proteases/metabolism , Structural Homology, Protein
17.
Commun Biol ; 4(1): 934, 2021 08 03.
Article in English | MEDLINE | ID: covidwho-1341013

ABSTRACT

We describe an analytical method for the identification, mapping and relative quantitation of glycopeptides from SARS-CoV-2 Spike protein. The method may be executed using a LC-TOF mass spectrometer, requires no specialized knowledge of glycan analysis and exploits the differential resolving power of reverse phase HPLC. While this separation technique resolves peptides with high efficiency, glycans are resolved poorly, if at all. Consequently, glycopeptides consisting of the same peptide bearing different glycan structures will all possess very similar retention times and co-elute. Rather than a disadvantage, we show that shared retention time can be used to map multiple glycan species to the same peptide and location. In combination with MSMS and pseudo MS3, we have constructed a detailed mass-retention time database for Spike glycopeptides. This database allows any accurate mass LC-MS laboratory to reliably identify and quantify Spike glycopeptides from a single overnight elastase digest in less than 90 minutes.


Subject(s)
Glycopeptides/chemistry , Mass Spectrometry/methods , Spike Glycoprotein, Coronavirus/chemistry , Databases, Protein , Time Factors
18.
Sci Rep ; 11(1): 15107, 2021 07 23.
Article in English | MEDLINE | ID: covidwho-1322506

ABSTRACT

COVID-19 outbreak brings intense pressure on healthcare systems, with an urgent demand for effective diagnostic, prognostic and therapeutic procedures. Here, we employed Automated Machine Learning (AutoML) to analyze three publicly available high throughput COVID-19 datasets, including proteomic, metabolomic and transcriptomic measurements. Pathway analysis of the selected features was also performed. Analysis of a combined proteomic and metabolomic dataset led to 10 equivalent signatures of two features each, with AUC 0.840 (CI 0.723-0.941) in discriminating severe from non-severe COVID-19 patients. A transcriptomic dataset led to two equivalent signatures of eight features each, with AUC 0.914 (CI 0.865-0.955) in identifying COVID-19 patients from those with a different acute respiratory illness. Another transcriptomic dataset led to two equivalent signatures of nine features each, with AUC 0.967 (CI 0.899-0.996) in identifying COVID-19 patients from virus-free individuals. Signature predictive performance remained high upon validation. Multiple new features emerged and pathway analysis revealed biological relevance by implication in Viral mRNA Translation, Interferon gamma signaling and Innate Immune System pathways. In conclusion, AutoML analysis led to multiple biosignatures of high predictive performance, with reduced features and large choice of alternative predictors. These favorable characteristics are eminent for development of cost-effective assays to contribute to better disease management.


Subject(s)
COVID-19/diagnosis , COVID-19/metabolism , Immunity, Innate/immunology , Machine Learning , SARS-CoV-2/metabolism , Biomarkers/blood , COVID-19/genetics , COVID-19/pathology , Computer Simulation , Databases, Factual , Databases, Genetic , Databases, Protein , Gene Expression Profiling , Humans , Immunity, Innate/genetics , Interferon-gamma/blood , Metabolomics , Prognosis , Proteomics , ROC Curve , SARS-CoV-2/genetics , Severity of Illness Index , Signal Transduction/genetics , Signal Transduction/immunology , Software
19.
Proteins ; 89(11): 1541-1556, 2021 11.
Article in English | MEDLINE | ID: covidwho-1303290

ABSTRACT

The expansion of three-dimensional protein structures and enhanced computing power have significantly facilitated our understanding of protein sequence/structure/function relationships. A challenge in structural genomics is to predict the function of uncharacterized proteins. Protein function deconvolution based on global sequence or structural homology is impracticable when a protein relates to no other proteins with known function, and in such cases, functional relationships can be established by detecting their local ligand binding site similarity. Here, we introduce a sequence order-independent comparison algorithm, PocketShape, for structural proteome-wide exploration of protein functional site by fully considering the geometry of the backbones, orientation of the sidechains, and physiochemical properties of the pocket-lining residues. PocketShape is efficient in distinguishing similar from dissimilar ligand binding site pairs by retrieving 99.3% of the similar pairs while rejecting 100% of the dissimilar pairs on a dataset containing 1538 binding site pairs. This method successfully classifies 83 enzyme structures with diverse functions into 12 clusters, which is highly in accordance with the actual structural classification of proteins classification. PocketShape also achieves superior performances than other methods in protein profiling based on experimental data. Potential new applications for representative SARS-CoV-2 drugs Remdesivir and 11a are predicted. The high accuracy and time-efficient characteristics of PocketShape will undoubtedly make it a promising complementary tool for proteome-wide protein function inference and drug repurposing study.


Subject(s)
Algorithms , Antiviral Agents/pharmacology , Drug Repositioning/methods , Proteins/metabolism , Adenosine Monophosphate/analogs & derivatives , Adenosine Monophosphate/chemistry , Adenosine Monophosphate/metabolism , Adenosine Monophosphate/pharmacology , Alanine/analogs & derivatives , Alanine/chemistry , Alanine/metabolism , Alanine/pharmacology , Antiviral Agents/chemistry , Binding Sites , Coronavirus 3C Proteases/chemistry , Coronavirus 3C Proteases/metabolism , Databases, Protein , GTP Phosphohydrolases/chemistry , GTP Phosphohydrolases/metabolism , Phosphoglycerate Mutase/chemistry , Phosphoglycerate Mutase/metabolism , Proteins/chemistry , Proteins/classification , ROC Curve , SARS-CoV-2/drug effects
20.
Acta Crystallogr D Struct Biol ; 77(Pt 6): 727-745, 2021 Jun 01.
Article in English | MEDLINE | ID: covidwho-1254969

ABSTRACT

Covalent linkages between constituent blocks of macromolecules and ligands have been subject to inconsistent treatment during the model-building, refinement and deposition process. This may stem from a number of sources, including difficulties with initially detecting the covalent linkage, identifying the correct chemistry, obtaining an appropriate restraint dictionary and ensuring its correct application. The analysis presented herein assesses the extent of problems involving covalent linkages in the Protein Data Bank (PDB). Not only will this facilitate the remediation of existing models, but also, more importantly, it will inform and thus improve the quality of future linkages. By considering linkages of known type in the CCP4 Monomer Library (CCP4-ML), failure to model a covalent linkage is identified to result in inaccurate (systematically longer) interatomic distances. Scanning the PDB for proximal atom pairs that do not have a corresponding type in the CCP4-ML reveals a large number of commonly occurring types of unannotated potential linkages; in general, these may or may not be covalently linked. Manual consideration of the most commonly occurring cases identifies a number of genuine classes of covalent linkages. The recent expansion of the CCP4-ML is discussed, which has involved the addition of over 16 000 and the replacement of over 11 000 component dictionaries using AceDRG. As part of this effort, the CCP4-ML has also been extended using AceDRG link dictionaries for the aforementioned linkage types identified in this analysis. This will facilitate the identification of such linkage types in future modelling efforts, whilst concurrently easing the process involved in their application. The need for a universal standard for maintaining link records corresponding to covalent linkages, and references to the associated dictionaries used during modelling and refinement, following deposition to the PDB is emphasized. The importance of correctly modelling covalent linkages is demonstrated using a case study, which involves the covalent linkage of an inhibitor to the main protease in various viral species, including SARS-CoV-2. This example demonstrates the importance of properly modelling covalent linkages using a comprehensive restraint dictionary, as opposed to just using a single interatomic distance restraint or failing to model the covalent linkage at all.


Subject(s)
Models, Structural , Crystallography, X-Ray , Databases, Protein , Ligands , SARS-CoV-2/chemistry , Viral Proteins/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL
...