Search | VHL Regional Portal

SCOPe: improvements to the structural classification of proteins - extended database to facilitate variant interpretation and machine learning.

Chandonia, John-Marc; Guan, Lindsey; Lin, Shiangyi; Yu, Changhua; Fox, Naomi K; Brenner, Steven E.

Nucleic Acids Res ; 50(D1): D553-D559, 2022 01 07.

Article in English | MEDLINE | ID: mdl-34850923

ABSTRACT

The Structural Classification of Proteins-extended (SCOPe, https://scop.berkeley.edu) knowledgebase aims to provide an accurate, detailed, and comprehensive description of the structural and evolutionary relationships amongst the majority of proteins of known structure, along with resources for analyzing the protein structures and their sequences. Structures from the PDB are divided into domains and classified using a combination of manual curation and highly precise automated methods. In the current release of SCOPe, 2.08, we have developed search and display tools for analysis of genetic variants we mapped to structures classified in SCOPe. In order to improve the utility of SCOPe to automated methods such as deep learning classifiers that rely on multiple alignment of sequences of homologous proteins, we have introduced new machine-parseable annotations that indicate aberrant structures as well as domains that are distinguished by a smaller repeat unit. We also classified structures from 74 of the largest Pfam families not previously classified in SCOPe, and we improved our algorithm to remove N- and C-terminal cloning, expression and purification sequences from SCOPe domains. SCOPe 2.08-stable classifies 106 976 PDB entries (about 60% of PDB entries).

Subject(s)

Computational Biology , Databases, Protein , Proteins/classification , Algorithms , Databases, Chemical , Gene Expression Regulation/genetics , Machine Learning , Proteins/genetics

SCOPe: classification of large macromolecular structures in the structural classification of proteins-extended database.

Chandonia, John-Marc; Fox, Naomi K; Brenner, Steven E.

Nucleic Acids Res ; 47(D1): D475-D481, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30500919

ABSTRACT

The SCOPe (Structural Classification of Proteins-extended, https://scop.berkeley.edu) database hierarchically classifies domains from the majority of proteins of known structure according to their structural and evolutionary relationships. SCOPe also incorporates and updates the ASTRAL compendium, which provides multiple databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. Protein structures are classified using a combination of manual curation and highly precise automated methods. In the current release of SCOPe, 2.07, we have focused our manual curation efforts on larger protein structures, including the spliceosome, proteasome and RNA polymerase I, as well as many other Pfam families that had not previously been classified. Domains from these large protein complexes are distinctive in several ways: novel non-globular folds are more common, and domains from previously observed protein families often have N- or C-terminal extensions that were disordered or not present in previous structures. The current monthly release update, SCOPe 2.07-2018-10-18, classifies 90 992 PDB entries (about two thirds of PDB entries).

Subject(s)

Databases, Protein , Protein Domains , Multiprotein Complexes/chemistry , Proteasome Endopeptidase Complex/chemistry , Spliceosomes/chemistry

hgvs: A Python package for manipulating sequence variants using HGVS nomenclature: 2018 Update.

Wang, Meng; Callenberg, Keith M; Dalgleish, Raymond; Fedtsov, Alexandre; Fox, Naomi K; Freeman, Peter J; Jacobs, Kevin B; Kaleta, Piotr; McMurry, Andrew J; Prlic, Andreas; Rajaraman, Veena; Hart, Reece K.

Hum Mutat ; 39(12): 1803-1813, 2018 12.

Article in English | MEDLINE | ID: mdl-30129167

ABSTRACT

The Human Genome Variation Society (HGVS) nomenclature guidelines encourage the accurate and standard description of DNA, RNA, and protein sequence variants in public variant databases and the scientific literature. Inconsistent application of the HGVS guidelines can lead to misinterpretation of variants in clinical settings. Reliable software tools are essential to ensure consistent application of the HGVS guidelines when reporting and interpreting variants. We present the hgvs Python package, a comprehensive tool for manipulating sequence variants according to the HGVS nomenclature guidelines. Distinguishing features of the hgvs package include: (1) parsing, formatting, validating, and normalizing variants on genome, transcript, and protein sequences; (2) projecting variants between aligned sequences, including those with gapped alignments; (3) flexible installation using remote or local data (fully local installations eliminate network dependencies); (4) extensive automated tests; and (5) open source development by a community from eight organizations worldwide. This report summarizes recent and significant updates to the hgvs package since its original release in 2014, and presents results of extensive validation using clinical relevant variants from ClinVar and HGMD.

Subject(s)

Computational Biology/methods , Databases, Genetic , Genetic Variation , Genome, Human , Guidelines as Topic , Humans , Societies, Medical , Software

SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database.

Chandonia, John-Marc; Fox, Naomi K; Brenner, Steven E.

J Mol Biol ; 429(3): 348-355, 2017 02 03.

Article in English | MEDLINE | ID: mdl-27914894

ABSTRACT

SCOPe (Structural Classification of Proteins-extended, http://scop.berkeley.edu) is a database of relationships between protein structures that extends the Structural Classification of Proteins (SCOP) database. SCOP is an expert-curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. SCOPe classifies the majority of protein structures released since SCOP development concluded in 2009, using a combination of manual curation and highly precise automated tools, aiming to have the same accuracy as fully hand-curated SCOP releases. SCOPe also incorporates and updates the ASTRAL compendium, which provides several databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. SCOPe continues high-quality manual classification of new superfamilies, a key feature of SCOP. Artifacts such as expression tags are now separated into their own class, in order to distinguish them from the homology-based annotations in the remainder of the SCOPe hierarchy. SCOPe 2.06 contains 77,439 Protein Data Bank entries, double the 38,221 structures classified in SCOP.

Subject(s)

Databases, Protein , Mutation , Proteins/classification , Artifacts , Cloning, Molecular , Computational Biology , Protein Structure, Tertiary , Proteins/chemistry

The value of protein structure classification information-Surveying the scientific literature.

Fox, Naomi K; Brenner, Steven E; Chandonia, John-Marc.

Proteins ; 83(11): 2025-38, 2015 Nov.

Article in English | MEDLINE | ID: mdl-26313554

ABSTRACT

The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.

Subject(s)

Proteins/chemistry , Proteins/classification , Algorithms , Computational Biology , Databases, Protein , Protein Conformation

SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures.

Fox, Naomi K; Brenner, Steven E; Chandonia, John-Marc.

Nucleic Acids Res ; 42(Database issue): D304-9, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24304899

ABSTRACT

Structural Classification of Proteins-extended (SCOPe, http://scop.berkeley.edu) is a database of protein structural relationships that extends the SCOP database. SCOP is a manually curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. Development of the SCOP 1.x series concluded with SCOP 1.75. The ASTRAL compendium provides several databases and tools to aid in the analysis of the protein structures classified in SCOP, particularly through the use of their sequences. SCOPe extends version 1.75 of the SCOP database, using automated curation methods to classify many structures released since SCOP 1.75. We have rigorously benchmarked our automated methods to ensure that they are as accurate as manual curation, though there are many proteins to which our methods cannot be applied. SCOPe is also partially manually curated to correct some errors in SCOP. SCOPe aims to be backward compatible with SCOP, providing the same parseable files and a history of changes between all stable SCOP and SCOPe releases. SCOPe also incorporates and updates the ASTRAL database. The latest release of SCOPe, 2.03, contains 59 514 Protein Data Bank (PDB) entries, increasing the number of structures classified in SCOP by 55% and including more than 65% of the protein structures in the PDB.

Subject(s)

Databases, Protein , Protein Structure, Tertiary , Internet , Proteins/classification , Systems Integration

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL