Search | VHL Regional Portal

1.

Improved prediction of MHC-peptide binding using protein language models.

Hashemi, Nasser; Hao, Boran; Ignatov, Mikhail; Paschalidis, Ioannis Ch; Vakili, Pirooz; Vajda, Sandor; Kozakov, Dima.

Front Bioinform ; 3: 1207380, 2023.

Article in English | MEDLINE | ID: mdl-37663788

ABSTRACT

Major histocompatibility complex Class I (MHC-I) molecules bind to peptides derived from intracellular antigens and present them on the surface of cells, allowing the immune system (T cells) to detect them. Elucidating the process of this presentation is essential for regulation and potential manipulation of the cellular immune system. Predicting whether a given peptide binds to an MHC molecule is an important step in the above process and has motivated the introduction of many computational approaches to address this problem. NetMHCPan, a pan-specific model for predicting binding of peptides to any MHC molecule, is one of the most widely used methods which focuses on solving this binary classification problem using shallow neural networks. The recent successful results of Deep Learning (DL) methods, especially Natural Language Processing (NLP-based) pretrained models in various applications, including protein structure determination, motivated us to explore their use in this problem. Specifically, we consider the application of deep learning models pretrained on large datasets of protein sequences to predict MHC Class I-peptide binding. Using the standard performance metrics in this area, and the same training and test sets, we show that our models outperform NetMHCpan4.1, currently considered as the-state-of-the-art.

2.

Improved cluster ranking in protein-protein docking using a regression approach.

Sotudian, Shahabeddin; Desta, Israel T; Hashemi, Nasser; Zarbafian, Shahrooz; Kozakov, Dima; Vakili, Pirooz; Vajda, Sandor; Paschalidis, Ioannis Ch.

Comput Struct Biotechnol J ; 19: 2269-2278, 2021.

Article in English | MEDLINE | ID: mdl-33995918

ABSTRACT

We develop a Regression-based Ranking by Pairwise Cluster Comparisons (RRPCC) method to rank clusters of similar protein complex conformations generated by an underlying docking program. The method leverages robust regression to predict the relative quality difference between any pair or clusters and combines these pairwise assessments to form a ranked list of clusters, from higher to lower quality. We apply RRPCC to clusters produced by the automated docking server ClusPro and, depending on the training/validation strategy, we show improvement by 24-100% in ranking acceptable or better quality clusters first, and by 15-100% in ranking medium or better quality clusters first. We compare the RRPCC-ClusPro combination to a number of alternatives, and show that very different machine learning approaches to scoring docked structures yield similar success rates. Finally, we discuss the current limitations on sampling and scoring, looking ahead to further improvements. Interestingly, some features important for improved scoring are internal energy terms that occur only due to the local energy minimization applied in the refinement stage following rigid body docking.

3.

Focused grid-based resampling for protein docking and mapping.

Mamonov, Artem B; Moghadasi, Mohammad; Mirzaei, Hanieh; Zarbafian, Shahrooz; Grove, Laurie E; Bohnuud, Tanggis; Vakili, Pirooz; Ch Paschalidis, Ioannis; Vajda, Sandor; Kozakov, Dima.

J Comput Chem ; 37(11): 961-70, 2016 Apr 30.

Article in English | MEDLINE | ID: mdl-26837000

ABSTRACT

The fast Fourier transform (FFT) sampling algorithm has been used with success in application to protein-protein docking and for protein mapping, the latter docking a variety of small organic molecules for the identification of binding hot spots on the target protein. Here we explore the local rather than global usage of the FFT sampling approach in docking applications. If the global FFT based search yields a near-native cluster of docked structures for a protein complex, then focused resampling of the cluster generally leads to a substantial increase in the number of conformations close to the native structure. In protein mapping, focused resampling of the selected hot spot regions generally reveals further hot spots that, while not as strong as the primary hot spots, also contribute to ligand binding. The detection of additional ligand binding regions is shown by the improved overlap between hot spots and bound ligands.

Subject(s)

Fourier Analysis , Molecular Docking Simulation , Proteins/chemistry , Algorithms , Ligands , Protein Conformation

4.

Energy Minimization on Manifolds for Docking Flexible Molecules.

Mirzaei, Hanieh; Zarbafian, Shahrooz; Villar, Elizabeth; Mottarella, Scott; Beglov, Dmitri; Vajda, Sandor; Paschalidis, Ioannis Ch; Vakili, Pirooz; Kozakov, Dima.

J Chem Theory Comput ; 11(3): 1063-76, 2015 Mar 10.

Article in English | MEDLINE | ID: mdl-26478722

ABSTRACT

In this paper, we extend a recently introduced rigid body minimization algorithm, defined on manifolds, to the problem of minimizing the energy of interacting flexible molecules. The goal is to integrate moving the ligand in six dimensional rotational/translational space with internal rotations around rotatable bonds within the two molecules. We show that adding rotational degrees of freedom to the rigid moves of the ligand results in an overall optimization search space that is a manifold to which our manifold optimization approach can be extended. The effectiveness of the method is shown for three different docking problems of increasing complexity. First, we minimize the energy of fragment-size ligands with a single rotatable bond as part of a protein mapping method developed for the identification of binding hot spots. Second, we consider energy minimization for docking a flexible ligand to a rigid protein receptor, an approach frequently used in existing methods. In the third problem, we account for flexibility in both the ligand and the receptor. Results show that minimization using the manifold optimization algorithm is substantially more efficient than minimization using a traditional all-atom optimization algorithm while producing solutions of comparable quality. In addition to the specific problems considered, the method is general enough to be used in a large class of applications such as docking multidomain proteins with flexible hinges. The code is available under open source license (at http://cluspro.bu.edu/Code/Code_Rigtree.tar) and with minimal effort can be incorporated into any molecular modeling package.

Subject(s)

Proteins/chemistry , Small Molecule Libraries/chemistry , Algorithms , Ligands , Molecular Docking Simulation , Pliability , Rotation

5.

The impact of side-chain packing on protein docking refinement.

Moghadasi, Mohammad; Mirzaei, Hanieh; Mamonov, Artem; Vakili, Pirooz; Vajda, Sandor; Paschalidis, Ioannis Ch; Kozakov, Dima.

J Chem Inf Model ; 55(4): 872-81, 2015 Apr 27.

Article in English | MEDLINE | ID: mdl-25714358

ABSTRACT

We study the impact of optimizing the side-chain positions in the interface region between two proteins during the process of binding. Mathematically, the problem is similar to side-chain prediction, which has been extensively explored in the process of protein structure prediction. The protein-protein docking application, however, has a number of characteristics that necessitate different algorithmic and implementation choices. In this work, we implement a distributed approximate algorithm that can be implemented on multiprocessor architectures and enables a trade-off between accuracy and running speed. We report computational results on benchmarks of enzyme-inhibitor and other types of complexes, establishing that the side-chain flexibility our algorithm introduces substantially improves the performance of docking protocols. Furthermore, we establish that the inclusion of unbound side-chain conformers in the side-chain positioning problem is critical in these performance improvements. The code is available to the community under open source license.

Subject(s)

Molecular Docking Simulation , Proteins/chemistry , Proteins/metabolism , Algorithms , Thermodynamics , Time Factors

6.

Efficient Maintenance and Update of Nonbonded Lists in Macromolecular Simulations.

Chowdhury, Rezaul; Beglov, Dmitri; Moghadasi, Mohammad; Paschalidis, Ioannis Ch; Vakili, Pirooz; Vajda, Sandor; Bajaj, Chandrajit; Kozakov, Dima.

J Chem Theory Comput ; 10(10): 4449-4454, 2014 Oct 14.

Article in English | MEDLINE | ID: mdl-25328494

ABSTRACT

Molecular mechanics and dynamics simulations use distance based cutoff approximations for faster computation of pairwise van der Waals and electrostatic energy terms. These approximations traditionally use a precalculated and periodically updated list of interacting atom pairs, known as the "nonbonded neighborhood lists" or nblists, in order to reduce the overhead of finding atom pairs that are within distance cutoff. The size of nblists grows linearly with the number of atoms in the system and superlinearly with the distance cutoff, and as a result, they require significant amount of memory for large molecular systems. The high space usage leads to poor cache performance, which slows computation for large distance cutoffs. Also, the high cost of updates means that one cannot afford to keep the data structure always synchronized with the configuration of the molecules when efficiency is at stake. We propose a dynamic octree data structure for implicit maintenance of nblists using space linear in the number of atoms but independent of the distance cutoff. The list can be updated very efficiently as the coordinates of atoms change during the simulation. Unlike explicit nblists, a single octree works for all distance cutoffs. In addition, octree is a cache-friendly data structure, and hence, it is less prone to cache miss slowdowns on modern memory hierarchies than nblists. Octrees use almost 2 orders of magnitude less memory, which is crucial for simulation of large systems, and while they are comparable in performance to nblists when the distance cutoff is small, they outperform nblists for larger systems and large cutoffs. Our tests show that octree implementation is approximately 1.5 times faster in practical use case scenarios as compared to nblists.

7.

Encounter complexes and dimensionality reduction in protein-protein association.

Kozakov, Dima; Li, Keyong; Hall, David R; Beglov, Dmitri; Zheng, Jiefu; Vakili, Pirooz; Schueler-Furman, Ora; Paschalidis, Ioannis Ch; Clore, G Marius; Vajda, Sandor.

Elife ; 3: e01370, 2014 Apr 08.

Article in English | MEDLINE | ID: mdl-24714491

ABSTRACT

An outstanding challenge has been to understand the mechanism whereby proteins associate. We report here the results of exhaustively sampling the conformational space in protein-protein association using a physics-based energy function. The agreement between experimental intermolecular paramagnetic relaxation enhancement (PRE) data and the PRE profiles calculated from the docked structures shows that the method captures both specific and non-specific encounter complexes. To explore the energy landscape in the vicinity of the native structure, the nonlinear manifold describing the relative orientation of two solid bodies is projected onto a Euclidean space in which the shape of low energy regions is studied by principal component analysis. Results show that the energy surface is canyon-like, with a smooth funnel within a two dimensional subspace capturing over 75% of the total motion. Thus, proteins tend to associate along preferred pathways, similar to sliding of a protein along DNA in the process of protein-DNA recognition. DOI: http://dx.doi.org/10.7554/eLife.01370.001.

Subject(s)

Proteins/chemistry , Proteins/metabolism , Molecular Docking Simulation , Protein Binding , Protein Conformation , Thermodynamics

8.

Optimization on the space of rigid and flexible motions: an alternative manifold optimization approach.

Vakili, Pirooz; Mirzaei, Hanieh; Zarbafian, Shahrooz; Paschalidis, Ioannis Ch; Kozakov, Dima; Vajda, Sandor.

Proc IEEE Conf Decis Control ; 2014: 5825-5830, 2014 Dec.

Article in English | MEDLINE | ID: mdl-25774073

ABSTRACT

In this paper we consider the problem of minimization of a cost function that depends on the location and poses of one or more rigid bodies, or bodies that consist of rigid parts hinged together. We present a unified setting for formulating this problem as an optimization on an appropriately defined manifold for which efficient manifold optimizations can be developed. This setting is based on a Lie group representation of the rigid movements of a body that is different from what is commonly used for this purpose. We illustrate this approach by using the steepest descent algorithm on the manifold of the search space and specify conditions for its convergence.

9.

A Subspace Semi-Definite programming-based Underestimation (SSDU) method for stochastic global optimization in protein docking.

Nan, Feng; Moghadasi, Mohammad; Vakili, Pirooz; Vajda, Sandor; Kozakov, Dima; Ch Paschalidis, Ioannis.

Proc IEEE Conf Decis Control ; 2014: 4623-4628, 2014 Dec.

Article in English | MEDLINE | ID: mdl-25914440

ABSTRACT

We propose a new stochastic global optimization method targeting protein docking problems. The method is based on finding a general convex polynomial underestimator to the binding energy function in a permissive subspace that possesses a funnel-like structure. We use Principal Component Analysis (PCA) to determine such permissive subspaces. The problem of finding the general convex polynomial underestimator is reduced into the problem of ensuring that a certain polynomial is a Sum-of-Squares (SOS), which can be done via semi-definite programming. The underestimator is then used to bias sampling of the energy function in order to recover a deep minimum. We show that the proposed method significantly improves the quality of docked conformations compared to existing methods.

10.

Flexible Refinement of Protein-Ligand Docking on Manifolds.

Mirzaei, Hanieh; Villar, Elizabeth; Mottarella, Scott; Beglov, Dmitri; Paschalidis, Ioannis Ch; Vajda, Sandor; Kozakov, Dima; Vakili, Pirooz.

Proc IEEE Conf Decis Control ; : 1392-1397, 2013.

Article in English | MEDLINE | ID: mdl-24830567

ABSTRACT

Our work is motivated by energy minimization of biological macromolecules, an essential step in computational docking. By allowing some ligand flexibility, we generalize a recently introduced novel representation of rigid body minimization as an optimization on the [Formula: see text] manifold, rather than on the commonly used Special Euclidean group SE(3). We show that the resulting flexible docking can also be formulated as an optimization on a Lie group that is the direct product of simpler Lie groups for which geodesics and exponential maps can be easily obtained. Our computational results for a local optimization algorithm developed based on this formulation show that it is about an order of magnitude faster than the state-of-the-art local minimization algorithms for computational protein-small molecule docking.

11.

A New Distributed Algorithm for Side-Chain Positioning in the Process of Protein Docking^*

Moghadasi, Mohammad; Kozakov, Dima; Vakili, Pirooz; Vajda, Sandor; Paschalidis, Ioannis Ch.

Proc IEEE Conf Decis Control ; : 739-744, 2013.

Article in English | MEDLINE | ID: mdl-24844567

ABSTRACT

Side-chain positioning (SCP) is an important component of computational protein docking methods. Existing SCP methods and available software have been designed for protein folding applications where side-chain positioning is also important. As a result they do not take into account significant special structure that SCP for docking exhibits. We propose a new algorithm which poses SCP as a Maximum Weighted Independent Set (MWIS) problem on an appropriately constructed graph. We develop an approximate algorithm which solves a relaxation of the MWIS and then rounds the solution to obtain a high-quality feasible solution to the problem. The algorithm is fully distributed and can be executed on a large network of processing nodes requiring only local information and message-passing between neighboring nodes. Motivated by the special structure in docking, we establish optimality guarantees for a certain class of graphs. Our results on a benchmark set of enzyme-inhibitor protein complexes show that our predictions are close to the native structure and are comparable to the ones obtained by a state-of-the-art method. The results are substantially improved if rotamers from unbound protein structures are included in the search. We also establish that the use of our SCP algorithm substantially improves docking results.

12.

Rigid Body Energy Minimization on Manifolds for Molecular Docking.

Mirzaei, Hanieh; Beglov, Dmitri; Paschalidis, Ioannis Ch; Vajda, Sandor; Vakili, Pirooz; Kozakov, Dima.

J Chem Theory Comput ; 8(11): 4374-4380, 2012 Nov 13.

Article in English | MEDLINE | ID: mdl-23382659

ABSTRACT

Virtually all docking methods include some local continuous minimization of an energy/scoring function in order to remove steric clashes and obtain more reliable energy values. In this paper, we describe an efficient rigid-body optimization algorithm that, compared to the most widely used algorithms, converges approximately an order of magnitude faster to conformations with equal or slightly lower energy. The space of rigid body transformations is a nonlinear manifold, namely, a space which locally resembles a Euclidean space. We use a canonical parametrization of the manifold, called the exponential parametrization, to map the Euclidean tangent space of the manifold onto the manifold itself. Thus, we locally transform the rigid body optimization to an optimization over a Euclidean space where basic optimization algorithms are applicable. Compared to commonly used methods, this formulation substantially reduces the dimension of the search space. As a result, it requires far fewer costly function and gradient evaluations and leads to a more efficient algorithm. We have selected the LBFGS quasi-Newton method for local optimization since it uses only gradient information to obtain second order information about the energy function and avoids the far more costly direct Hessian evaluations. Two applications, one in protein-protein docking, and the other in protein-small molecular interactions, as part of macromolecular docking protocols are presented. The code is available to the community under open source license, and with minimal effort can be incorporated into any molecular modeling package.

13.

A Message Passing Approach to Side Chain Positioning with Applications in Protein Docking Refinement.

Moghadasi, Mohammad; Kozakov, Dima; Mamonov, Artem B; Vakili, Pirooz; Vajda, Sandor; Paschalidis, Ioannis Ch.

Proc IEEE Conf Decis Control ; : 2310-2315, 2012.

Article in English | MEDLINE | ID: mdl-23515575

ABSTRACT

We introduce a message-passing algorithm to solve the Side Chain Positioning (SCP) problem. SCP is a crucial component of protein docking refinement, which is a key step of an important class of problems in computational structural biology called protein docking. We model SCP as a combinatorial optimization problem and formulate it as a Maximum Weighted Independent Set (MWIS) problem. We then employ a modified and convergent belief-propagation algorithm to solve a relaxation of MWIS and develop randomized estimation heuristics that use the relaxed solution to obtain an effective MWIS feasible solution. Using a benchmark set of protein complexes we demonstrate that our approach leads to more accurate docking predictions compared to a baseline algorithm that does not solve the SCP.

14.

A New Approach to Rigid Body Minimization with Application to Molecular Docking.

Mirzaei, Hanieh; Kozakov, Dima; Beglov, Dmitri; Paschalidis, Ioannis Ch; Vajda, Sandor; Vakili, Pirooz.

Proc IEEE Conf Decis Control ; : 2983-2988, 2012 Dec.

Article in English | MEDLINE | ID: mdl-24763338

ABSTRACT

Our work is motivated by energy minimization in the space of rigid affine transformations of macromolecules, an essential step in computational protein-protein docking. We introduce a novel representation of rigid body motion that leads to a natural formulation of the energy minimization problem as an optimization on the [Formula: see text] manifold, rather than the commonly used SE(3). The new representation avoids the complications associated with optimization on the SE(3) manifold and provides additional flexibilities for optimization not available in that formulation. The approach is applicable to general rigid body minimization problems. Our computational results for a local optimization algorithm developed based on the new approach show that it is about an order of magnitude faster than a state of art local minimization algorithms for computational protein-protein docking.

15.

Achieving reliability and high accuracy in automated protein docking: ClusPro, PIPER, SDU, and stability analysis in CAPRI rounds 13-19.

Kozakov, Dima; Hall, David R; Beglov, Dmitri; Brenke, Ryan; Comeau, Stephen R; Shen, Yang; Li, Keyong; Zheng, Jiefu; Vakili, Pirooz; Paschalidis, Ioannis Ch; Vajda, Sandor.

Proteins ; 78(15): 3124-30, 2010 Nov 15.

Article in English | MEDLINE | ID: mdl-20818657

ABSTRACT

Our approach to protein-protein docking includes three main steps. First, we run PIPER, a rigid body docking program based on the Fast Fourier Transform (FFT) correlation approach, extended to use pairwise interactions potentials. Second, the 1000 best energy conformations are clustered, and the 30 largest clusters are retained for refinement. Third, the stability of the clusters is analyzed by short Monte Carlo simulations, and the structures are refined by the medium-range optimization method SDU. The first two steps of this approach are implemented in the ClusPro 2.0 protein-protein docking server. Despite being fully automated, the last step is computationally too expensive to be included in the server. When comparing the models obtained in CAPRI rounds 13-19 by ClusPro, by the refinement of the ClusPro predictions and by all predictor groups, we arrived at three conclusions. First, for the first time in the CAPRI history, our automated ClusPro server was able to compete with the best human predictor groups. Second, selecting the top ranked models, our current protocol reliably generates high-quality structures of protein-protein complexes from the structures of separately crystallized proteins, even in the absence of biological information, provided that there is limited backbone conformational change. Third, despite occasional successes, homology modeling requires further improvement to achieve reliable docking results.

Subject(s)

Computational Biology/methods , Models, Chemical , Proteins/chemistry , Software , Algorithms , Cluster Analysis , Molecular Dynamics Simulation , Monte Carlo Method , Protein Binding , Protein Conformation , Protein Multimerization , Proteins/metabolism

16.

Protein docking by the underestimation of free energy funnels in the space of encounter complexes.

Shen, Yang; Paschalidis, Ioannis Ch; Vakili, Pirooz; Vajda, Sandor.

PLoS Comput Biol ; 4(10): e1000191, 2008 Oct.

Article in English | MEDLINE | ID: mdl-18846200

ABSTRACT

Similarly to protein folding, the association of two proteins is driven by a free energy funnel, determined by favorable interactions in some neighborhood of the native state. We describe a docking method based on stochastic global minimization of funnel-shaped energy functions in the space of rigid body motions (SE(3)) while accounting for flexibility of the interface side chains. The method, called semi-definite programming-based underestimation (SDU), employs a general quadratic function to underestimate a set of local energy minima and uses the resulting underestimator to bias further sampling. While SDU effectively minimizes functions with funnel-shaped basins, its application to docking in the rotational and translational space SE(3) is not straightforward due to the geometry of that space. We introduce a strategy that uses separate independent variables for side-chain optimization, center-to-center distance of the two proteins, and five angular descriptors of the relative orientations of the molecules. The removal of the center-to-center distance turns out to vastly improve the efficiency of the search, because the five-dimensional space now exhibits a well-behaved energy surface suitable for underestimation. This algorithm explores the free energy surface spanned by encounter complexes that correspond to local free energy minima and shows similarity to the model of macromolecular association that proceeds through a series of collisions. Results for standard protein docking benchmarks establish that in this space the free energy landscape is a funnel in a reasonably broad neighborhood of the native state and that the SDU strategy can generate docking predictions with less than 5 A ligand interface C(alpha) root-mean-square deviation while achieving an approximately 20-fold efficiency gain compared to Monte Carlo methods.

Subject(s)

Protein Interaction Mapping/statistics & numerical data , Proteins/chemistry , Algorithms , Computational Biology , Models, Chemical , Multiprotein Complexes/chemistry , Protein Binding , Protein Interaction Domains and Motifs , Software , Thermodynamics

17.

SDU: A Semidefinite Programming-Based Underestimation Method for Stochastic Global Optimization in Protein Docking.

Paschalidis, Ioannis Ch; Shen, Yang; Vakili, Pirooz; Vajda, Sandor.

IEEE Trans Automat Contr ; 52(4): 664-676, 2007 Apr 01.

Article in English | MEDLINE | ID: mdl-19759849

ABSTRACT

This paper introduces a new stochastic global optimization method targeting protein-protein docking problems, an important class of problems in computational structural biology. The method is based on finding general convex quadratic underestimators to the binding energy function that is funnel-like. Finding the optimum underestimator requires solving a semidefinite programming problem, hence the name semidefinite programming-based underestimation (SDU). The underestimator is used to bias sampling in the search region. It is established that under appropriate conditions SDU locates the global energy minimum with probability approaching one as the sample size grows. A detailed comparison of SDU with a related method of convex global underestimator (CGU), and computational results for protein-protein docking problems are provided.

18.

Protein-protein docking with reduced potentials by exploiting multi-dimensional energy funnels.

Paschalidis, Ioannis Ch; Shen, Yang; Vakili, Pirooz; Vajda, Sandor.

Conf Proc IEEE Eng Med Biol Soc ; 2006: 5330-3, 2006.

Article in English | MEDLINE | ID: mdl-17946298

ABSTRACT

We propose a new computational approach for protein docking exploiting energy funnels in the 6-dimensional space of translations and rotations of the ligand with respect to the receptor. Our approach consists of a series of translational and orientational moves of the ligand towards the receptor. Each move is performed using a global optimization method we have developed - the semi-definite underestimation (SDU) method - which can exploit a funnel-like energy function. We compared our approach with Monte Carlo on a set of 10 protein complexes using two residue-level potentials. To achieve the same level of performance (produce a near-native < or =3 A RMSD complex) our approach reduces energy evaluations by more than a factor of two, on average.

Subject(s)

Computational Biology/instrumentation , Computational Biology/methods , Protein Interaction Mapping , Proteins/chemistry , Algorithms , Cell Physiological Phenomena , Computer Simulation , Databases, Protein , Ligands , Models, Molecular , Models, Theoretical , Monte Carlo Method , Protein Binding , Protein Conformation

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL