Search | VHL Regional Portal

SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals.

Lam, S D; Bordin, N; Waman, V P; Scholes, H M; Ashford, P; Sen, N; van Dorp, L; Rauer, C; Dawson, N L; Pang, C S M; Abbasian, M; Sillitoe, I; Edwards, S J L; Fraternali, F; Lees, J G; Santini, J M; Orengo, C A.

Sci Rep ; 10(1): 16471, 2020 10 05.

Article in English | MEDLINE | ID: mdl-33020502

ABSTRACT

SARS-CoV-2 has a zoonotic origin and was transmitted to humans via an undetermined intermediate host, leading to infections in humans and other mammals. To enter host cells, the viral spike protein (S-protein) binds to its receptor, ACE2, and is then processed by TMPRSS2. Whilst receptor binding contributes to the viral host range, S-protein:ACE2 complexes from other animals have not been investigated widely. To predict infection risks, we modelled S-protein:ACE2 complexes from 215 vertebrate species, calculated changes in the energy of the complex caused by mutations in each species, relative to human ACE2, and correlated these changes with COVID-19 infection data. We also analysed structural interactions to better understand the key residues contributing to affinity. We predict that mutations are more detrimental in ACE2 than TMPRSS2. Finally, we demonstrate phylogenetically that human SARS-CoV-2 strains have been isolated in animals. Our results suggest that SARS-CoV-2 can infect a broad range of mammals, but few fish, birds or reptiles. Susceptible animals could serve as reservoirs of the virus, necessitating careful ongoing animal management and surveillance.

Subject(s)

Peptidyl-Dipeptidase A/chemistry , Phylogeny , Spike Glycoprotein, Coronavirus/chemistry , Angiotensin-Converting Enzyme 2 , Animals , Betacoronavirus/classification , Betacoronavirus/genetics , Humans , Mammals , Molecular Docking Simulation , Mutation , Peptidyl-Dipeptidase A/classification , Peptidyl-Dipeptidase A/genetics , Peptidyl-Dipeptidase A/metabolism , Protein Binding , SARS-CoV-2 , Spike Glycoprotein, Coronavirus/metabolism

cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly.

Lewis, T E; Sillitoe, I; Lees, J G.

Bioinformatics ; 35(10): 1766-1767, 2019 05 15.

Article in English | MEDLINE | ID: mdl-30295745

ABSTRACT

MOTIVATION: Many bioinformatics areas require us to assign domain matches onto stretches of a query protein. Starting with a set of candidate matches, we want to identify the optimal subset that has limited/no overlap between matches. This may be further complicated by discontinuous domains in the input data. Existing tools are increasingly facing very large data-sets for which they require prohibitive amounts of CPU-time and memory. RESULTS: We present cath-resolve-hits (CRH), a new tool that uses a dynamic-programming algorithm implemented in open-source C++ to handle large datasets quickly (up to â¼1 million hits/second) and in reasonable amounts of memory. It accepts multiple input formats and provides its output in plain text, JSON or graphical HTML. We describe a benchmark against an existing algorithm, which shows CRH delivers very similar or slightly improved results and very much improved CPU/memory performance on large datasets. AVAILABILITY AND IMPLEMENTATION: CRH is available at https://github.com/UCLOrengoGroup/cath-tools; documentation is available at http://cath-tools.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Software , Algorithms , Documentation , Proteins

The CATH database: an extended protein family resource for structural and functional genomics.

Pearl, F M G; Bennett, C F; Bray, J E; Harrison, A P; Martin, N; Shepherd, A; Sillitoe, I; Thornton, J; Orengo, C A.

Nucleic Acids Res ; 31(1): 452-5, 2003 Jan 01.

Article in English | MEDLINE | ID: mdl-12520050

ABSTRACT

The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath_new) currently contains 34 287 domain structures classified into 1383 superfamilies and 3285 sequence families. Each structural family is expanded with domain sequence relatives recruited from GenBank using a variety of efficient sequence search protocols and reliable thresholds. This extended resource, known as the CATH-protein family database (CATH-PFDB) contains a total of 310 000 domain sequences classified into 26 812 sequence families. New sequence search protocols have been designed, based on these intermediate sequence libraries, to allow more regular updating of the classification. Further developments include the adaptation of a recently developed method for rapid structure comparison, based on secondary structure matching, for domain boundary assignment. The philosophy behind CATHEDRAL is the recognition of recurrent folds already classified in CATH. Benchmarking of CATHEDRAL, using manually validated domain assignments, demonstrated that 43% of domains boundaries could be completely automatically assigned. This is an improvement on a previous consensus approach for which only 10-20% of domains could be reliably processed in a completely automated fashion. Since domain boundary assignment is a significant bottleneck in the classification of new structures, CATHEDRAL will also help to increase the frequency of CATH updates.

Subject(s)

Databases, Protein , Protein Structure, Tertiary , Proteins/classification , Animals , Automation , Genomics , Protein Folding , Protein Structure, Secondary , Proteins/chemistry , Proteins/physiology , Sequence Homology, Amino Acid , Structural Homology, Protein

Review: what can structural classifications reveal about protein evolution?

Orengo, C A; Sillitoe, I; Reeves, G; Pearl, F M.

J Struct Biol ; 134(2-3): 145-65, 2001.

Article in English | MEDLINE | ID: mdl-11551176

ABSTRACT

In this article we present a review of the methods used for comparing and classifying protein structures. We discuss the hierarchies and populations of fold groups and evolutionary families in some of the major classifications and we consider some of the problems confronting any general analyses of structural evolution in protein families. We also review some more recent analyses that have expanded these classifications by identifying sequence relatives in the genomes and thereby reveal interesting trends in fold usage and recurrence.

Subject(s)

Evolution, Molecular , Protein Folding , Proteins/chemistry , Proteins/classification , Animals , Proteins/genetics , Structure-Activity Relationship

A rapid classification protocol for the CATH Domain Database to support structural genomics.

Pearl, F M; Martin, N; Bray, J E; Buchan, D W; Harrison, A P; Lee, D; Reeves, G A; Shepherd, A J; Sillitoe, I; Todd, A E; Thornton, J M; Orengo, C A.

Nucleic Acids Res ; 29(1): 223-7, 2001 Jan 01.

Article in English | MEDLINE | ID: mdl-11125098

ABSTRACT

In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25,320 structural domains and a further 160,000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153-165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homologous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non-identical structures.

Subject(s)

Databases, Factual , Proteins/chemistry , Computational Biology , Genomics , Internet , Protein Structure, Tertiary , Proteins/classification , Proteins/genetics , Sequence Alignment , Software , Structure-Activity Relationship

Assigning genomic sequences to CATH.

Pearl, F M; Lee, D; Bray, J E; Sillitoe, I; Todd, A E; Harrison, A P; Thornton, J M; Orengo, C A.

Nucleic Acids Res ; 28(1): 277-82, 2000 Jan 01.

Article in English | MEDLINE | ID: mdl-10592246

ABSTRACT

We report the latest release (version 1.6) of the CATH protein domains database (http://www.biochem.ucl. ac.uk/bsm/cath ). This is a hierarchical classification of 18 577 domains into evolutionary families and structural groupings. We have identified 1028 homo-logous superfamilies in which the proteins have both structural, and sequence or functional similarity. These can be further clustered into 672 fold groups and 35 distinct architectures. Recent developments of the database include the generation of 3D templates for recognising structural relatives in each fold group, which has led to significant improvements in the speed and accuracy of updating the database and also means that less manual validation is required. We also report the establishment of the CATH-PFDB (Protein Family Database), which associates 1D sequences with the 3D homologous superfamilies. Sequences showing identifiable homology to entries in CATH have been extracted from GenBank using PSI-BLAST. A CATH-PSIBLAST server has been established, which allows you to scan a new sequence against the database. The CATH Dictionary of Homologous Superfamilies (DHS), which contains validated multiple structural alignments annotated with consensus functional information for evolutionary protein superfamilies, has been updated to include annotations associated with sequence relatives identified in GenBank. The DHS is a powerful tool for considering the variation of functional properties within a given CATH superfamily and in deciding what functional properties may be reliably inherited by a newly identified relative.

Subject(s)

Databases, Factual , Genome , Proteins/genetics , Amino Acid Sequence , Molecular Sequence Data , Proteins/chemistry , Sequence Homology, Amino Acid

Analysis and assessment of ab initio three-dimensional prediction, secondary structure, and contacts prediction.

Orengo, C A; Bray, J E; Hubbard, T; LoConte, L; Sillitoe, I.

Proteins ; Suppl 3: 149-70, 1999.

Article in English | MEDLINE | ID: mdl-10526364

ABSTRACT

CASP3 saw a substantial increase in the volume of ab initio 3D prediction data, with 507 datasets for fifteen selected targets and sixty-one groups participating. As with CASP2, methods ranged from computationally intensive strategies that attempt to recreate the physical and chemical forces involved in protein folding to the more recent knowledge-based approaches. These exploit information from the structure databases, extracting potentially similar fragments and/or distance constraints derived from multiple sequence alignments. The knowledge-based approaches generally gave more consistently successful predictions across the range of targets, particularly that of the Baker group (Bystroff and Baker, J Mol Biol 1998;281:565-577; Simons et al. Proteins Suppl 1999;3:171-176), which used a fragment library. In the secondary structure prediction category, the most successful approaches built on the concepts used in PHD (Rost et al. Comput Appl Biosci 1994;10:53-60), an accepted standard in this field. Like PHD, they exploit neural networks but have different strategies for incorporating multiple sequence data or position-dependent weight matrices for training the networks. Analysis of the contact data, for which only six groups participated, suggested that as yet this data provides a rather weak signal. However, in combination with other types of prediction data it can sometimes be a useful constraint for identifying the correct structure.

Subject(s)

Protein Structure, Secondary , Proteins/chemistry , Animals , Bacterial Proteins/chemistry , Models, Molecular , Peptide Fragments/chemistry , Protein Folding

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL