Search | VHL Regional Portal

Variable Neighborhood Search with Cost Function Networks To Solve Large Computational Protein Design Problems.

Charpentier, Antoine; Mignon, David; Barbe, Sophie; Cortes, Juan; Schiex, Thomas; Simonson, Thomas; Allouche, David.

J Chem Inf Model ; 59(1): 127-136, 2019 01 28.

Article in English | MEDLINE | ID: mdl-30380857

ABSTRACT

Computational protein design (CPD) aims to predict amino acid sequences that fold to specific structures and perform desired functions. CPD depends on a rotamer library, an energy function, and an algorithm to search the sequence/conformation space. Variable neighborhood search (VNS) with cost function networks is a powerful framework that can provide tight upper bounds on the global minimum energy. We propose a new CPD heuristic based on VNS in which a subset of the solution space (a "neighborhood") is explored, whose size is gradually increased with a dedicated probabilistic heuristic. The algorithm was tested on 99 protein designs with fixed backbones involving nine proteins from the SH2, SH3, and PDZ families. The number of mutating positions was 20, 30, or all of the amino acids, while the rest of the protein explored side-chain rotamers. VNS was more successful than Monte Carlo (MC), replica-exchange MC, and a heuristic steepest-descent energy minimization, providing solutions with equal or lower best energies in most cases. For complete protein redesign, it gave solutions that were 2.5 to 11.2 kcal/mol lower in energy than those obtained with the other approaches. VNS is implemented in the toulbar2 software. It could be very helpful for large and/or complex design problems.

Subject(s)

Computational Biology , Protein Engineering , Proteins/chemistry , Algorithms , Models, Molecular , Monte Carlo Method , Protein Conformation , Software

Deterministic Search Methods for Computational Protein Design.

Traoré, Seydou; Allouche, David; André, Isabelle; Schiex, Thomas; Barbe, Sophie.

Methods Mol Biol ; 1529: 107-123, 2017.

Article in English | MEDLINE | ID: mdl-27914047

ABSTRACT

One main challenge in Computational Protein Design (CPD) lies in the exploration of the amino-acid sequence space, while considering, to some extent, side chain flexibility. The exorbitant size of the search space urges for the development of efficient exact deterministic search methods enabling identification of low-energy sequence-conformation models, corresponding either to the global minimum energy conformation (GMEC) or an ensemble of guaranteed near-optimal solutions. In contrast to stochastic local search methods that are not guaranteed to find the GMEC, exact deterministic approaches always identify the GMEC and prove its optimality in finite but exponential worst-case time. After a brief overview on these two classes of methods, we discuss the grounds and merits of four deterministic methods that have been applied to solve CPD problems. These approaches are based either on the Dead-End-Elimination theorem combined with A* algorithm (DEE/A*), on Cost Function Networks algorithms (CFN), on Integer Linear Programming solvers (ILP) or on Markov Random Fields solvers (MRF). The way two of these methods (DEE/A* and CFN) can be used in practice to identify low-energy sequence-conformation models starting from a pairwise decomposed energy matrix is detailed in this review.

Subject(s)

Computational Biology/methods , Protein Engineering/methods , Proteins , Algorithms , Computer Simulation , Models, Molecular , Protein Conformation , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Software , Structure-Activity Relationship

Fast search algorithms for computational protein design.

Traoré, Seydou; Roberts, Kyle E; Allouche, David; Donald, Bruce R; André, Isabelle; Schiex, Thomas; Barbe, Sophie.

J Comput Chem ; 37(12): 1048-58, 2016 May 05.

Article in English | MEDLINE | ID: mdl-26833706

ABSTRACT

One of the main challenges in computational protein design (CPD) is the huge size of the protein sequence and conformational space that has to be computationally explored. Recently, we showed that state-of-the-art combinatorial optimization technologies based on Cost Function Network (CFN) processing allow speeding up provable rigid backbone protein design methods by several orders of magnitudes. Building up on this, we improved and injected CFN technology into the well-established CPD package Osprey to allow all Osprey CPD algorithms to benefit from associated speedups. Because Osprey fundamentally relies on the ability of A* to produce conformations in increasing order of energy, we defined new A* strategies combining CFN lower bounds, with new side-chain positioning-based branching scheme. Beyond the speedups obtained in the new A*-CFN combination, this novel branching scheme enables a much faster enumeration of suboptimal sequences, far beyond what is reachable without it. Together with the immediate and important speedups provided by CFN technology, these developments directly benefit to all the algorithms that previously relied on the DEE/ A* combination inside Osprey* and make it possible to solve larger CPD problems with provable algorithms.

Subject(s)

Algorithms , Computational Biology , Proteins/chemistry , Amino Acid Sequence , Drug Design , Protein Conformation

Guaranteed Discrete Energy Optimization on Large Protein Design Problems.

Simoncini, David; Allouche, David; de Givry, Simon; Delmas, Céline; Barbe, Sophie; Schiex, Thomas.

J Chem Theory Comput ; 11(12): 5980-9, 2015 Dec 08.

Article in English | MEDLINE | ID: mdl-26610100

ABSTRACT

In Computational Protein Design (CPD), assuming a rigid backbone and amino-acid rotamer library, the problem of finding a sequence with an optimal conformation is NP-hard. In this paper, using Dunbrack's rotamer library and Talaris2014 decomposable energy function, we use an exact deterministic method combining branch and bound, arc consistency, and tree-decomposition to provenly identify the global minimum energy sequence-conformation on full-redesign problems, defining search spaces of size up to 10(234). This is achieved on a single core of a standard computing server, requiring a maximum of 66GB RAM. A variant of the algorithm is able to exhaustively enumerate all sequence-conformations within an energy threshold of the optimum. These proven optimal solutions are then used to evaluate the frequencies and amplitudes, in energy and sequence, at which an existing CPD-dedicated simulated annealing implementation may miss the optimum on these full redesign problems. The probability of finding an optimum drops close to 0 very quickly. In the worst case, despite 1,000 repeats, the annealing algorithm remained more than 1 Rosetta unit away from the optimum, leading to design sequences that could differ from the optimal sequence by more than 30% of their amino acids.

Subject(s)

Algorithms , Proteins/chemistry , Computational Biology , Proteins/metabolism , Thermodynamics

A new framework for computational protein design through cost function network optimization.

Traoré, Seydou; Allouche, David; André, Isabelle; de Givry, Simon; Katsirelos, George; Schiex, Thomas; Barbe, Sophie.

Bioinformatics ; 29(17): 2129-36, 2013 Sep 01.

Article in English | MEDLINE | ID: mdl-23842814

ABSTRACT

MOTIVATION: The main challenge for structure-based computational protein design (CPD) remains the combinatorial nature of the search space. Even in its simplest fixed-backbone formulation, CPD encompasses a computationally difficult NP-hard problem that prevents the exact exploration of complex systems defining large sequence-conformation spaces. RESULTS: We present here a CPD framework, based on cost function network (CFN) solving, a recent exact combinatorial optimization technique, to efficiently handle highly complex combinatorial spaces encountered in various protein design problems. We show that the CFN-based approach is able to solve optimality a variety of complex designs that could often not be solved using a usual CPD-dedicated tool or state-of-the-art exact operations research tools. Beyond the identification of the optimal solution, the global minimum-energy conformation, the CFN-based method is also able to quickly enumerate large ensembles of suboptimal solutions of interest to rationally build experimental enzyme mutant libraries. AVAILABILITY: The combined pipeline used to generate energetic models (based on a patched version of the open source solver Osprey 2.0), the conversion to CFN models (based on Perl scripts) and CFN solving (based on the open source solver toulbar2) are all available at http://genoweb.toulouse.inra.fr/~tschiex/CPD

Subject(s)

Protein Conformation , Protein Engineering/methods , Algorithms , Models, Molecular , Proteins/chemistry , Sequence Analysis, Protein , Software

Gene regulatory network reconstruction using Bayesian networks, the Dantzig Selector, the Lasso and their meta-analysis.

Vignes, Matthieu; Vandel, Jimmy; Allouche, David; Ramadan-Alban, Nidal; Cierco-Ayrolles, Christine; Schiex, Thomas; Mangin, Brigitte; de Givry, Simon.

PLoS One ; 6(12): e29165, 2011.

Article in English | MEDLINE | ID: mdl-22216195

ABSTRACT

Modern technologies and especially next generation sequencing facilities are giving a cheaper access to genotype and genomic data measured on the same sample at once. This creates an ideal situation for multifactorial experiments designed to infer gene regulatory networks. The fifth "Dialogue for Reverse Engineering Assessments and Methods" (DREAM5) challenges are aimed at assessing methods and associated algorithms devoted to the inference of biological networks. Challenge 3 on "Systems Genetics" proposed to infer causal gene regulatory networks from different genetical genomics data sets. We investigated a wide panel of methods ranging from Bayesian networks to penalised linear regressions to analyse such data, and proposed a simple yet very powerful meta-analysis, which combines these inference methods. We present results of the Challenge as well as more in-depth analysis of predicted networks in terms of structure and reliability. The developed meta-analysis was ranked first among the 16 teams participating in Challenge 3A. It paves the way for future extensions of our inference method and more accurate gene network estimates in the context of genetical genomics.

Subject(s)

Bayes Theorem , Gene Regulatory Networks , Genomics , Mutation

BioMAJ: a flexible framework for databanks synchronization and processing.

Filangi, Olivier; Beausse, Yoann; Assi, Anthony; Legrand, Ludovic; Larré, Jean-Marc; Martin, Véronique; Collin, Olivier; Caron, Christophe; Leroy, Hugues; Allouche, David.

Bioinformatics ; 24(16): 1823-5, 2008 Aug 15.

Article in English | MEDLINE | ID: mdl-18593718

ABSTRACT

UNLABELLED: Large- and medium-scale computational molecular biology projects require accurate bioinformatics software and numerous heterogeneous biological databanks, which are distributed around the world. BioMAJ provides a flexible, robust, fully automated environment for managing such massive amounts of data. The JAVA application enables automation of the data update cycle process and supervision of the locally mirrored data repository. We have developed workflows that handle some of the most commonly used bioinformatics databases. A set of scripts is also available for post-synchronization data treatment consisting of indexation or format conversion (for NCBI blast, SRS, EMBOSS, GCG, etc.). BioMAJ can be easily extended by personal homemade processing scripts. Source history can be kept via html reports containing statements of locally managed databanks. AVAILABILITY: http://biomaj.genouest.org. BioMAJ is free open software. It is freely available under the CECILL version 2 license.

Subject(s)

Algorithms , Database Management Systems , Databases, Genetic , Information Storage and Retrieval/methods , Programming Languages , Software , User-Computer Interface , Computational Biology/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL