Search | VHL Regional Portal

Redundancy-weighting the PDB for detailed secondary structure prediction using deep-learning models.

Sidi, Tomer; Keasar, Chen.

Bioinformatics ; 36(12): 3733-3738, 2020 06 01.

Article in English | MEDLINE | ID: mdl-32186698

ABSTRACT

MOTIVATION: The Protein Data Bank (PDB), the ultimate source for data in structural biology, is inherently imbalanced. To alleviate biases, virtually all structural biology studies use nonredundant (NR) subsets of the PDB, which include only a fraction of the available data. An alternative approach, dubbed redundancy-weighting (RW), down-weights redundant entries rather than discarding them. This approach may be particularly helpful for machine-learning (ML) methods that use the PDB as their source for data. Methods for secondary structure prediction (SSP) have greatly improved over the years with recent studies achieving above 70% accuracy for eight-class (DSSP) prediction. As these methods typically incorporate ML techniques, training on RW datasets might improve accuracy, as well as pave the way toward larger and more informative secondary structure classes. RESULTS: This study compares the SSP performances of deep-learning models trained on either RW or NR datasets. We show that training on RW sets consistently results in better prediction of 3- (HCE), 8- (DSSP) and 13-class (STR2) secondary structures. AVAILABILITY AND IMPLEMENTATION: The ML models, the datasets used for their derivation and testing, and a stand-alone SSP program for DSSP and STR2 predictions, are freely available under LGPL license in http://meshi1.cs.bgu.ac.il/rw. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Deep Learning , Computational Biology , Databases, Protein , Machine Learning , Protein Structure, Secondary

Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning.

Mirzaei, Shokoufeh; Sidi, Tomer; Keasar, Chen; Crivelli, Silvia.

IEEE/ACM Trans Comput Biol Bioinform ; 16(5): 1515-1523, 2019.

Article in English | MEDLINE | ID: mdl-28113636

ABSTRACT

The function of a protein is determined by its structure, which creates a need for efficient methods of protein structure determination to advance scientific and medical research. Because current experimental structure determination methods carry a high price tag, computational predictions are highly desirable. Given a protein sequence, computational methods produce numerous 3D structures known as decoys. Selection of the best quality decoys is both challenging and essential as the end users can handle only a few ones. Therefore, scoring functions are central to decoy selection. They combine measurable features into a single number indicator of decoy quality. Unfortunately, current scoring functions do not consistently select the best decoys. Machine learning techniques offer great potential to improve decoy scoring. This paper presents two machine-learning based scoring functions to predict the quality of proteins structures, i.e., the similarity between the predicted structure and the experimental one without knowing the latter. We use different metrics to compare these scoring functions against three state-of-the-art scores. This is a first attempt at comparing different scoring functions using the same non-redundant dataset for training and testing and the same features. The results show that adding informative features may be more significant than the method used.

Subject(s)

Computational Biology/methods , Proteins , Support Vector Machine , Algorithms , Databases, Protein , Machine Learning , Proteins/chemistry , Proteins/classification , Proteins/metabolism

An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12.

Keasar, Chen; McGuffin, Liam J; Wallner, Björn; Chopra, Gaurav; Adhikari, Badri; Bhattacharya, Debswapna; Blake, Lauren; Bortot, Leandro Oliveira; Cao, Renzhi; Dhanasekaran, B K; Dimas, Itzhel; Faccioli, Rodrigo Antonio; Faraggi, Eshel; Ganzynkowicz, Robert; Ghosh, Sambit; Ghosh, Soma; Gieldon, Artur; Golon, Lukasz; He, Yi; Heo, Lim; Hou, Jie; Khan, Main; Khatib, Firas; Khoury, George A; Kieslich, Chris; Kim, David E; Krupa, Pawel; Lee, Gyu Rie; Li, Hongbo; Li, Jilong; Lipska, Agnieszka; Liwo, Adam; Maghrabi, Ali Hassan A; Mirdita, Milot; Mirzaei, Shokoufeh; Mozolewska, Magdalena A; Onel, Melis; Ovchinnikov, Sergey; Shah, Anand; Shah, Utkarsh; Sidi, Tomer; Sieradzan, Adam K; Slusarz, Magdalena; Slusarz, Rafal; Smadbeck, James; Tamamis, Phanourios; Trieber, Nicholas; Wirecki, Tomasz; Yin, Yanping; Zhang, Yang.

Sci Rep ; 8(1): 9939, 2018 07 02.

Article in English | MEDLINE | ID: mdl-29967418

ABSTRACT

Every two years groups worldwide participate in the Critical Assessment of Protein Structure Prediction (CASP) experiment to blindly test the strengths and weaknesses of their computational methods. CASP has significantly advanced the field but many hurdles still remain, which may require new ideas and collaborations. In 2012 a web-based effort called WeFold, was initiated to promote collaboration within the CASP community and attract researchers from other fields to contribute new ideas to CASP. Members of the WeFold coopetition (cooperation and competition) participated in CASP as individual teams, but also shared components of their methods to create hybrid pipelines and actively contributed to this effort. We assert that the scale and diversity of integrative prediction pipelines could not have been achieved by any individual lab or even by any collaboration among a few partners. The models contributed by the participating groups and generated by the pipelines are publicly available at the WeFold website providing a wealth of data that remains to be tapped. Here, we analyze the results of the 2014 and 2016 pipelines showing improvements according to the CASP assessment as well as areas that require further adjustments and research.

Subject(s)

Caspase 12/metabolism , Caspases/metabolism , Computational Biology/methods , Models, Molecular , Software , Caspase 12/chemistry , Caspases/chemistry , Humans , Protein Conformation

Methods for estimation of model accuracy in CASP12.

Elofsson, Arne; Joo, Keehyoung; Keasar, Chen; Lee, Jooyoung; Maghrabi, Ali H A; Manavalan, Balachandran; McGuffin, Liam J; Ménendez Hurtado, David; Mirabello, Claudio; Pilstål, Robert; Sidi, Tomer; Uziela, Karolis; Wallner, Björn.

Proteins ; 86 Suppl 1: 361-373, 2018 03.

Article in English | MEDLINE | ID: mdl-28975666

ABSTRACT

Methods to reliably estimate the quality of 3D models of proteins are essential drivers for the wide adoption and serious acceptance of protein structure predictions by life scientists. In this article, the most successful groups in CASP12 describe their latest methods for estimates of model accuracy (EMA). We show that pure single model accuracy estimation methods have shown clear progress since CASP11; the 3 top methods (MESHI, ProQ3, SVMQA) all perform better than the top method of CASP11 (ProQ2). Although the pure single model accuracy estimation methods outperform quasi-single (ModFOLD6 variations) and consensus methods (Pcons, ModFOLDclust2, Pcomb-domain, and Wallner) in model selection, they are still not as good as those methods in absolute model quality estimation and predictions of local quality. Finally, we show that when using contact-based model quality measures (CAD, lDDT) the single model quality methods perform relatively better.

Subject(s)

Computational Biology/methods , Models, Molecular , Protein Conformation , Proteins/chemistry , Databases, Protein , Humans , Sequence Alignment , Sequence Analysis, Protein

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL