Search | VHL Regional Portal

Improving the accuracy of protein secondary structure prediction using structural alignment.

Montgomerie, Scott; Sundararaj, Shan; Gallin, Warren J; Wishart, David S.

BMC Bioinformatics ; 7: 301, 2006 Jun 14.

Article in English | MEDLINE | ID: mdl-16774686

ABSTRACT

BACKGROUND: The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3) of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence) database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences), the probability of a newly identified sequence having a structural homologue is actually quite high. RESULTS: We have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25%) onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based) secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4-5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics) indicate that this new method can achieve a Q3 score approaching 88%. CONCLUSION: By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at http://wishart.biology.ualberta.ca/proteus. For high throughput or batch sequence analyses, the PROTEUS programs, databases (and server) can be downloaded and run locally.

Subject(s)

Protein Structure, Secondary , Proteins/chemistry , Proteomics/methods , Sequence Alignment/methods , Structural Homology, Protein , Algorithms , Computer Simulation , Databases, Protein , Internet , Software , User-Computer Interface

The CyberCell Database (CCDB): a comprehensive, self-updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli.

Sundararaj, Shan; Guo, Anchi; Habibi-Nazhad, Bahram; Rouani, Melania; Stothard, Paul; Ellison, Michael; Wishart, David S.

Nucleic Acids Res ; 32(Database issue): D293-5, 2004 Jan 01.

Article in English | MEDLINE | ID: mdl-14681416

ABSTRACT

The CyberCell Database (CCDB: http://redpoll. pharmacy.ualberta.ca/CCDB) is a comprehensive, web-accessible database designed to support and coordinate international efforts in modeling an Escherichia coli cell on a computer. The CCDB brings together both observed and derived quantitative data from numerous independent sources covering many aspects of the genomic, proteomic and metabolomic character of E.coli (strain K12). The database is self-updating but also supports 'community' annotation, and provides an extensive array of viewing, querying and search options including a powerful, easy-to-use relational data extraction system.

Subject(s)

Databases, Factual , Escherichia coli/cytology , Escherichia coli/metabolism , Genomics , Information Storage and Retrieval , Proteomics , Computational Biology , Databases, Genetic , Escherichia coli/chemistry , Escherichia coli/genetics , Internet

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL