Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
Protein Sci ; 33(8): e5109, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38989563

ABSTRACT

Understanding how proteins evolve under selective pressure is a longstanding challenge. The immensity of the search space has limited efforts to systematically evaluate the impact of multiple simultaneous mutations, so mutations have typically been assessed individually. However, epistasis, or the way in which mutations interact, prevents accurate prediction of combinatorial mutations based on measurements of individual mutations. Here, we use artificial intelligence to define the entire functional sequence landscape of a protein binding site in silico, and we call this approach Complete Combinatorial Mutational Enumeration (CCME). By leveraging CCME, we are able to construct a comprehensive map of the evolutionary connectivity within this functional sequence landscape. As a proof of concept, we applied CCME to the ACE2 binding site of the SARS-CoV-2 spike protein receptor binding domain. We selected representative variants from across the functional sequence landscape for testing in the laboratory. We identified variants that retained functionality to bind ACE2 despite changing over 40% of evaluated residue positions, and the variants now escape binding and neutralization by monoclonal antibodies. This work represents a crucial initial stride toward achieving precise predictions of pathogen evolution, opening avenues for proactive mitigation.


Subject(s)
Angiotensin-Converting Enzyme 2 , Mutation , SARS-CoV-2 , Spike Glycoprotein, Coronavirus , Spike Glycoprotein, Coronavirus/genetics , Spike Glycoprotein, Coronavirus/chemistry , Spike Glycoprotein, Coronavirus/metabolism , Angiotensin-Converting Enzyme 2/metabolism , Angiotensin-Converting Enzyme 2/chemistry , Angiotensin-Converting Enzyme 2/genetics , SARS-CoV-2/genetics , SARS-CoV-2/chemistry , SARS-CoV-2/metabolism , Humans , Binding Sites , COVID-19/virology , COVID-19/genetics , Protein Binding , Artificial Intelligence
2.
Leuk Res ; 136: 107437, 2024 01.
Article in English | MEDLINE | ID: mdl-38215555

ABSTRACT

We designed artificial intelligence-based prediction models (AIPM) using 52 diagnostic variables from 3687 patients included in the DATAML registry treated with intensive chemotherapy (IC, N = 3030) or azacitidine (AZA, N = 657) for an acute myeloid leukemia (AML). A neural network called multilayer perceptron (MLP) achieved a prediction accuracy for overall survival (OS) of 68.5% and 62.1% in the IC and AZA cohorts, respectively. The Boruta algorithm could select the most important variables for prediction without decreasing accuracy. Thirteen features were retained with this algorithm in the IC cohort: age, cytogenetic risk, white blood cells count, LDH, platelet count, albumin, MPO expression, mean corpuscular volume, CD117 expression, NPM1 mutation, AML status (de novo or secondary), multilineage dysplasia and ASXL1 mutation; and 7 variables in the AZA cohort: blood blasts, serum ferritin, CD56, LDH, hemoglobin, CD13 and disseminated intravascular coagulation (DIC). We believe that AIPM could help hematologists to deal with the huge amount of data available at diagnosis, enabling them to have an OS estimation and guide their treatment choice. Our registry-based AIPM could offer a large real-life dataset with original and exhaustive features and select a low number of diagnostic features with an equivalent accuracy of prediction, more appropriate to routine practice.


Subject(s)
Antimetabolites, Antineoplastic , Leukemia, Myeloid, Acute , Humans , Antimetabolites, Antineoplastic/therapeutic use , Artificial Intelligence , Treatment Outcome , Leukemia, Myeloid, Acute/diagnosis , Leukemia, Myeloid, Acute/drug therapy , Leukemia, Myeloid, Acute/genetics , Azacitidine/therapeutic use , Registries
3.
Res Sq ; 2023 Sep 11.
Article in English | MEDLINE | ID: mdl-36482980

ABSTRACT

Understanding how proteins evolve under selective pressure is a longstanding challenge. The immensity of the search space has limited efforts to systematically evaluate the impact of multiple simultaneous mutations, so mutations have typically been assessed individually. However, epistasis, or the way in which mutations interact, prevents accurate prediction of combinatorial mutations based on measurements of individual mutations. Here, we use artificial intelligence to define the entire functional sequence landscape of a protein binding site in silico, and we call this approach Complete Combinatorial Mutational Enumeration (CCME). By leveraging CCME, we are able to construct a comprehensive map of the evolutionary connectivity within this functional sequence landscape. As a proof of concept, we applied CCME to the ACE2 binding site of the SARS-CoV-2 spike protein receptor binding domain. We selected representative variants from across the functional sequence landscape for testing in the laboratory. We identified variants that retained functionality to bind ACE2 despite changing over 40% of evaluated residue positions, and the variants now escape binding and neutralization by monoclonal antibodies. This work represents a crucial initial stride towards achieving precise predictions of pathogen evolution, opening avenues for proactive mitigation.

4.
J Am Chem Soc ; 143(39): 15998-16006, 2021 10 06.
Article in English | MEDLINE | ID: mdl-34559526

ABSTRACT

The extant complex proteins must have evolved from ancient short and simple ancestors. The double-ψ ß-barrel (DPBB) is one of the oldest protein folds and conserved in various fundamental enzymes, such as the core domain of RNA polymerase. Here, by reverse engineering a modern DPBB domain, we reconstructed its plausible evolutionary pathway started by "interlacing homodimerization" of a half-size peptide, followed by gene duplication and fusion. Furthermore, by simplifying the amino acid repertoire of the peptide, we successfully created the DPBB fold with only seven amino acid types (Ala, Asp, Glu, Gly, Lys, Arg, and Val), which can be coded by only GNN and ARR (R = A or G) codons in the modern translation system. Thus, the DPBB fold could have been materialized by the early translation system and genetic code.


Subject(s)
Amino Acids/chemistry , Amino Acids/classification , DNA-Directed RNA Polymerases/chemistry , DNA-Directed RNA Polymerases/metabolism , Amino Acid Sequence , Models, Molecular , Protein Conformation , Protein Domains , Protein Folding
5.
Proteins ; 89(11): 1522-1529, 2021 11.
Article in English | MEDLINE | ID: mdl-34228826

ABSTRACT

Structure-based computational protein design (CPD) refers to the problem of finding a sequence of amino acids which folds into a specific desired protein structure, and possibly fulfills some targeted biochemical properties. Recent studies point out the particularly rugged CPD energy landscape, suggesting that local search optimization methods should be designed and tuned to easily escape local minima attraction basins. In this article, we analyze the performance and search dynamics of an iterated local search (ILS) algorithm enhanced with partition crossover. Our algorithm, PILS, quickly finds local minima and escapes their basins of attraction by solution perturbation. Additionally, the partition crossover operator exploits the structure of the residue interaction graph in order to efficiently mix solutions and find new unexplored basins. Our results on a benchmark of 30 proteins of various topology and size show that PILS consistently finds lower energy solutions compared to Rosetta fixbb and a classic ILS, and that the corresponding sequences are mostly closer to the native.


Subject(s)
Algorithms , Amino Acids/chemistry , Protein Engineering/methods , Proteins/chemistry , Software , Amino Acid Sequence , Benchmarking , Computational Biology , Protein Conformation , Protein Folding , Thermodynamics
6.
Bioinformatics ; 36(1): 122-130, 2020 01 01.
Article in English | MEDLINE | ID: mdl-31199465

ABSTRACT

MOTIVATION: Structure-based computational protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. The usual approach considers a single rigid backbone as a target, which ignores backbone flexibility. Multistate design (MSD) allows instead to consider several backbone states simultaneously, defining challenging computational problems. RESULTS: We introduce efficient reductions of positive MSD problems to Cost Function Networks with two different fitness definitions and implement them in the Pompd (Positive Multistate Protein design) software. Pompd is able to identify guaranteed optimal sequences of positive multistate full protein redesign problems and exhaustively enumerate suboptimal sequences close to the MSD optimum. Applied to nuclear magnetic resonance and back-rubbed X-ray structures, we observe that the average energy fitness provides the best sequence recovery. Our method outperforms state-of-the-art guaranteed computational design approaches by orders of magnitudes and can solve MSD problems with sizes previously unreachable with guaranteed algorithms. AVAILABILITY AND IMPLEMENTATION: https://forgemia.inra.fr/thomas.schiex/pompd as documented Open Source. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Protein Engineering , Proteins , Algorithms , Amino Acid Sequence , Computational Biology , Protein Conformation , Protein Engineering/methods , Proteins/chemistry , Software
7.
IUCrJ ; 6(Pt 1): 46-55, 2019 Jan 01.
Article in English | MEDLINE | ID: mdl-30713702

ABSTRACT

ß-Propeller proteins form one of the largest families of protein structures, with a pseudo-symmetrical fold made up of subdomains called blades. They are not only abundant but are also involved in a wide variety of cellular processes, often by acting as a platform for the assembly of protein complexes. WD40 proteins are a subfamily of propeller proteins with no intrinsic enzymatic activity, but their stable, modular architecture and versatile surface have allowed evolution to adapt them to many vital roles. By computationally reverse-engineering the duplication, fusion and diversification events in the evolutionary history of a WD40 protein, a perfectly symmetrical homologue called Tako8 was made. If two or four blades of Tako8 are expressed as single polypeptides, they do not self-assemble to complete the eight-bladed architecture, which may be owing to the closely spaced negative charges inside the ring. A different computational approach was employed to redesign Tako8 to create Ika8, a fourfold-symmetrical protein in which neighbouring blades carry compensating charges. Ika2 and Ika4, carrying two or four blades per subunit, respectively, were found to assemble spontaneously into a complete eight-bladed ring in solution. These artificial eight-bladed rings may find applications in bionanotechnology and as models to study the folding and evolution of WD40 proteins.

8.
Bioinformatics ; 35(14): 2418-2426, 2019 07 15.
Article in English | MEDLINE | ID: mdl-30496341

ABSTRACT

MOTIVATION: Structure-based Computational Protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. Energy functions remain however imperfect and injecting relevant information from known structures in the design process should lead to improved designs. RESULTS: We introduce Shades, a data-driven CPD method that exploits local structural environments in known protein structures together with energy to guide sequence design, while sampling side-chain and backbone conformations to accommodate mutations. Shades (Structural Homology Algorithm for protein DESign), is based on customized libraries of non-contiguous in-contact amino acid residue motifs. We have tested Shades on a public benchmark of 40 proteins selected from different protein families. When excluding homologous proteins, Shades achieved a protein sequence recovery of 30% and a protein sequence similarity of 46% on average, compared with the PFAM protein family of the target protein. When homologous structures were added, the wild-type sequence recovery rate achieved 93%. AVAILABILITY AND IMPLEMENTATION: Shades source code is available at https://bitbucket.org/satsumaimo/shades as a patch for Rosetta 3.8 with a curated protein structure database and ITEM library creation software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Software , Algorithms , Amino Acid Sequence , Computational Biology , Databases, Protein , Protein Conformation , Proteins
9.
Proteins ; 85(5): 852-858, 2017 05.
Article in English | MEDLINE | ID: mdl-28066917

ABSTRACT

Conformational search space exploration remains a major bottleneck for protein structure prediction methods. Population-based meta-heuristics typically enable the possibility to control the search dynamics and to tune the balance between local energy minimization and search space exploration. EdaFold is a fragment-based approach that can guide search by periodically updating the probability distribution over the fragment libraries used during model assembly. We implement the EdaFold algorithm as a Rosetta protocol and provide two different probability update policies: a cluster-based variation (EdaRosec ) and an energy-based one (EdaRoseen ). We analyze the search dynamics of our new Rosetta protocols and show that EdaRosec is able to provide predictions with lower C αRMSD to the native structure than EdaRoseen and Rosetta AbInitio Relax protocol. Our software is freely available as a C++ patch for the Rosetta suite and can be downloaded from http://www.riken.jp/zhangiru/software/. Our protocols can easily be extended in order to create alternative probability update policies and generate new search dynamics. Proteins 2017; 85:852-858. © 2016 Wiley Periodicals, Inc.


Subject(s)
Algorithms , Proteins/chemistry , Proteomics/statistics & numerical data , Software , Benchmarking , Cluster Analysis , Internet , Protein Conformation , Proteomics/methods , Thermodynamics
10.
F1000Res ; 6: 1722, 2017.
Article in English | MEDLINE | ID: mdl-29399321

ABSTRACT

Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

11.
Methods Mol Biol ; 1529: 309-322, 2017.
Article in English | MEDLINE | ID: mdl-27914059

ABSTRACT

Monomeric proteins with a number of identical repeats creating symmetrical structures are potentially very valuable building blocks with a variety of bionanotechnological applications. As such proteins do not occur naturally, the emerging field of computational protein design serves as an excellent tool to create them from nonsymmetrical templates. Existing pseudo-symmetrical proteins are believed to have evolved from oligomeric precursors by duplication and fusion of identical repeats. Here we describe a computational workflow to reverse-engineer this evolutionary process in order to create stable proteins consisting of identical sequence repeats.


Subject(s)
Computational Biology/methods , Evolution, Molecular , Models, Molecular , Protein Conformation , Protein Engineering/methods , Proteins/chemistry , Proteins/genetics , Amino Acid Sequence , Conserved Sequence , Databases, Genetic , Software
12.
J Chem Theory Comput ; 11(12): 5980-9, 2015 Dec 08.
Article in English | MEDLINE | ID: mdl-26610100

ABSTRACT

In Computational Protein Design (CPD), assuming a rigid backbone and amino-acid rotamer library, the problem of finding a sequence with an optimal conformation is NP-hard. In this paper, using Dunbrack's rotamer library and Talaris2014 decomposable energy function, we use an exact deterministic method combining branch and bound, arc consistency, and tree-decomposition to provenly identify the global minimum energy sequence-conformation on full-redesign problems, defining search spaces of size up to 10(234). This is achieved on a single core of a standard computing server, requiring a maximum of 66GB RAM. A variant of the algorithm is able to exhaustively enumerate all sequence-conformations within an energy threshold of the optimum. These proven optimal solutions are then used to evaluate the frequencies and amplitudes, in energy and sequence, at which an existing CPD-dedicated simulated annealing implementation may miss the optimum on these full redesign problems. The probability of finding an optimum drops close to 0 very quickly. In the worst case, despite 1,000 repeats, the annealing algorithm remained more than 1 Rosetta unit away from the optimum, leading to design sequences that could differ from the optimal sequence by more than 30% of their amino acids.


Subject(s)
Algorithms , Proteins/chemistry , Computational Biology , Proteins/metabolism , Thermodynamics
13.
Mol Inform ; 34(2-3): 97-104, 2015 02.
Article in English | MEDLINE | ID: mdl-27490032

ABSTRACT

Protein structure prediction directly from sequences is a very challenging problem in computational biology. One of the most successful approaches employs stochastic conformational sampling to search an empirically derived energy function landscape for the global energy minimum state. Due to the errors in the empirically derived energy function, the lowest energy conformation may not be the best model. We have evaluated the use of energy calculated by the fragment molecular orbital method (FMO energy) to assess the quality of predicted models and its ability to identify the best model among an ensemble of predicted models. The fragment molecular orbital method implemented in GAMESS was used to calculate the FMO energy of predicted models. When tested on eight protein targets, we found that the model ranking based on FMO energies is better than that based on empirically derived energies when there is sufficient diversity among these models. This model diversity can be estimated prior to the FMO energy calculations. Our result demonstrates that the FMO energy calculated by the fragment molecular orbital method is a practical and promising measure for the assessment of protein model quality and the selection of the best protein model among many generated.


Subject(s)
Models, Molecular , Proteins/chemistry , Thermodynamics
14.
Proc Natl Acad Sci U S A ; 111(42): 15102-7, 2014 Oct 21.
Article in English | MEDLINE | ID: mdl-25288768

ABSTRACT

The modular structure of many protein families, such as ß-propeller proteins, strongly implies that duplication played an important role in their evolution, leading to highly symmetrical intermediate forms. Previous attempts to create perfectly symmetrical propeller proteins have failed, however. We have therefore developed a new and rapid computational approach to design such proteins. As a test case, we have created a sixfold symmetrical ß-propeller protein and experimentally validated the structure using X-ray crystallography. Each blade consists of 42 residues. Proteins carrying 2-10 identical blades were also expressed and purified. Two or three tandem blades assemble to recreate the highly stable sixfold symmetrical architecture, consistent with the duplication and fusion theory. The other proteins produce different monodisperse complexes, up to 42 blades (180 kDa) in size, which self-assemble according to simple symmetry rules. Our procedure is suitable for creating nano-building blocks from different protein templates of desired symmetry.


Subject(s)
Mycobacterium tuberculosis/enzymology , Protein Engineering , Protein Structure, Secondary , Proteins/chemistry , Amino Acid Sequence , Biophysics , Circular Dichroism , Crystallography, X-Ray , Light , Models, Molecular , Models, Theoretical , Molecular Sequence Data , Nanotechnology , Scattering, Radiation , Sequence Homology, Amino Acid , Software , Spectrometry, Mass, Electrospray Ionization , Ultracentrifugation
15.
PLoS One ; 8(7): e68954, 2013.
Article in English | MEDLINE | ID: mdl-23935913

ABSTRACT

Fragment assembly is a powerful method of protein structure prediction that builds protein models from a pool of candidate fragments taken from known structures. Stochastic sampling is subsequently used to refine the models. The structures are first represented as coarse-grained models and then as all-atom models for computational efficiency. Many models have to be generated independently due to the stochastic nature of the sampling methods used to search for the global minimum in a complex energy landscape. In this paper we present EdaFold(AA), a fragment-based approach which shares information between the generated models and steers the search towards native-like regions. A distribution over fragments is estimated from a pool of low energy all-atom models. This iteratively-refined distribution is used to guide the selection of fragments during the building of models for subsequent rounds of structure prediction. The use of an estimation of distribution algorithm enabled EdaFold(AA) to reach lower energy levels and to generate a higher percentage of near-native models. [Formula: see text] uses an all-atom energy function and produces models with atomic resolution. We observed an improvement in energy-driven blind selection of models on a benchmark of EdaFold(AA) in comparison with the [Formula: see text] AbInitioRelax protocol.


Subject(s)
Algorithms , Models, Chemical , Peptide Fragments/chemistry , Software , Computer Simulation , Protein Conformation , Thermodynamics
16.
Acta Crystallogr D Biol Crystallogr ; 68(Pt 11): 1522-34, 2012 Nov.
Article in English | MEDLINE | ID: mdl-23090401

ABSTRACT

Recent advancements in computational methods for protein-structure prediction have made it possible to generate the high-quality de novo models required for ab initio phasing of crystallographic diffraction data using molecular replacement. Despite those encouraging achievements in ab initio phasing using de novo models, its success is limited only to those targets for which high-quality de novo models can be generated. In order to increase the scope of targets to which ab initio phasing with de novo models can be successfully applied, it is necessary to reduce the errors in the de novo models that are used as templates for molecular replacement. Here, an approach is introduced that can identify and rebuild the residues with larger errors, which subsequently reduces the overall C(α) root-mean-square deviation (CA-RMSD) from the native protein structure. The error in a predicted model is estimated from the average pairwise geometric distance per residue computed among selected lowest energy coarse-grained models. This score is subsequently employed to guide a rebuilding process that focuses on more error-prone residues in the coarse-grained models. This rebuilding methodology has been tested on ten protein targets that were unsuccessful using previous methods. The average CA-RMSD of the coarse-grained models was improved from 4.93 to 4.06 Å. For those models with CA-RMSD less than 3.0 Å, the average CA-RMSD was improved from 3.38 to 2.60 Å. These rebuilt coarse-grained models were then converted into all-atom models and refined to produce improved de novo models for molecular replacement. Seven diffraction data sets were successfully phased using rebuilt de novo models, indicating the improved quality of these rebuilt de novo models and the effectiveness of the rebuilding process. Software implementing this method, called MORPHEUS, can be downloaded from http://www.riken.jp/zhangiru/software.html.


Subject(s)
Algorithms , Computer Simulation , Models, Molecular , Proteins/chemistry , Crystallography, X-Ray , Protein Conformation
17.
PLoS One ; 7(7): e38799, 2012.
Article in English | MEDLINE | ID: mdl-22829868

ABSTRACT

Conformational sampling is one of the bottlenecks in fragment-based protein structure prediction approaches. They generally start with a coarse-grained optimization where mainchain atoms and centroids of side chains are considered, followed by a fine-grained optimization with an all-atom representation of proteins. It is during this coarse-grained phase that fragment-based methods sample intensely the conformational space. If the native-like region is sampled more, the accuracy of the final all-atom predictions may be improved accordingly. In this work we present EdaFold, a new method for fragment-based protein structure prediction based on an Estimation of Distribution Algorithm. Fragment-based approaches build protein models by assembling short fragments from known protein structures. Whereas the probability mass functions over the fragment libraries are uniform in the usual case, we propose an algorithm that learns from previously generated decoys and steers the search toward native-like regions. A comparison with Rosetta AbInitio protocol shows that EdaFold is able to generate models with lower energies and to enhance the percentage of near-native coarse-grained decoys on a benchmark of [Formula: see text] proteins. The best coarse-grained models produced by both methods were refined into all-atom models and used in molecular replacement. All atom decoys produced out of EdaFold's decoy set reach high enough accuracy to solve the crystallographic phase problem by molecular replacement for some test proteins. EdaFold showed a higher success rate in molecular replacement when compared to Rosetta. Our study suggests that improving low resolution coarse-grained decoys allows computational methods to avoid subsequent sampling issues during all-atom refinement and to produce better all-atom models. EdaFold can be downloaded from http://www.riken.jp/zhangiru/software.html [corrected].


Subject(s)
Algorithms , Proteins/chemistry , Protein Conformation
18.
J Comput Chem ; 33(4): 471-4, 2012 Feb 05.
Article in English | MEDLINE | ID: mdl-22120171

ABSTRACT

In protein folding, clustering is commonly used as one way to identify the best decoy produced. Initializing the pairwise distance matrix for a large decoy set is computationally expensive. We have proposed a fast method that works even on large decoy sets. This method is implemented in a software called Durandal. Durandal has been shown to be consistently faster than other software performing fast exact clustering. In some cases, Durandal can even outperform the speed of an approximate method. Durandal uses the triangular inequality to accelerate exact clustering, without compromising the distance function. Recently, we have further enhanced the performance of Durandal by incorporating a Quaternion-based characteristic polynomial method that has increased the speed of Durandal between 13% and 27% compared with the previous version. Durandal source code is available under the GNU General Public License at http://www.riken.jp/zhangiru/software/durandal_released_qcp.tgz. Alternatively, a compiled version of Durandal is also distributed with the nightly builds of the Phenix (http://www.phenix-online.org/) crystallographic software suite (Adams et al., Acta Crystallogr Sect D 2010, 66, 213).


Subject(s)
Protein Folding , Proteins/chemistry , Software , Cluster Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...