Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 32
Filter
Add more filters










Publication year range
1.
Cell Commun Signal ; 22(1): 90, 2024 02 01.
Article in English | MEDLINE | ID: mdl-38303060

ABSTRACT

Enhancing protein stability holds paramount significance in biotechnology, therapeutics, and the food industry. Circular permutations offer a distinctive avenue for manipulating protein stability while keeping intra-protein interactions intact. Amidst the creation of circular permutants, determining the optimal placement of the new N- and C-termini stands as a pivotal, albeit largely unexplored, endeavor. In this study, we employed PONDR-FIT's predictions of disorder propensity to guide the design of circular permutants for the GroEL apical domain (residues 191-345). Our underlying hypothesis posited that a higher predicted disorder value would correspond to reduced stability in the circular permutants, owing to the increased likelihood of fluctuations in the novel N- and C-termini. To substantiate this hypothesis, we engineered six circular permutants, positioning glycines within the loops as locations for the new N- and C-termini. We demonstrated the validity of our hypothesis along the set of the designed circular permutants, as supported by measurements of melting temperatures by circular dichroism and differential scanning microcalorimetry. Consequently, we propose a novel computational methodology that rationalizes the design of circular permutants with projected stability. Video Abstract.

2.
Biomolecules ; 13(8)2023 08 20.
Article in English | MEDLINE | ID: mdl-37627334

ABSTRACT

The molecular toxicity of the uranyl ion (UO22+) in living cells is primarily determined by its high affinity to both native and potential metal-binding sites that commonly occur in the structure of biomolecules. Recent advances in computational and experimental research have shed light on the structural properties and functional impacts of uranyl binding to proteins, organic ligands, nucleic acids, and their complexes. In the present work, we report the results of the computational investigation of the uranyl-mediated loss of DNA-binding activity of PARP-1, a eukaryotic enzyme that participates in DNA repair, cell differentiation, and the induction of inflammation. The latest experimental studies have shown that the uranyl ion directly interacts with its DNA-binding subdomains, zinc fingers Zn1 and Zn2, and alters their tertiary structure. Here, we propose an atomistic mechanism underlying this process and compute the free energy change along the suggested pathway. Our Quantum Mechanics/Molecular Mechanics (QM/MM) simulations of the Zn2-UO22+ complex indicate that the uranyl ion replaces zinc in its native binding site. However, the resulting state is destroyed due to the spontaneous internal hydrolysis of the U-Cys162 coordination bond. Despite the enthalpy of hydrolysis being +2.8 kcal/mol, the overall reaction free energy change is -0.6 kcal/mol, which is attributed to the loss of domain's native tertiary structure originally maintained by a zinc ion. The subsequent reorganization of the binding site includes the association of the uranyl ion with the Glu190/Asp191 acidic cluster and significant perturbations in the domain's tertiary structure driven by a further decrease in the free energy by 6.8 kcal/mol. The disruption of the DNA-binding interface revealed in our study is consistent with previous experimental findings and explains the loss of PARP-like zinc fingers' affinity for nucleic acids.


Subject(s)
Nucleic Acids , Poly (ADP-Ribose) Polymerase-1 , Computer Simulation , Protein Domains , DNA
3.
Int J Mol Sci ; 24(11)2023 May 23.
Article in English | MEDLINE | ID: mdl-37298068

ABSTRACT

Mutations that prevent the production of proteins in the DMD gene cause Duchenne muscular dystrophy. Most frequently, these are deletions leading to reading-frame shift. The "reading-frame rule" states that deletions that preserve ORF result in a milder Becker muscular dystrophy. By removing several exons, new genome editing tools enable reading-frame restoration in DMD with the production of BMD-like dystrophins. However, not every truncated dystrophin with a significant internal loss functions properly. To determine the effectiveness of potential genome editing, each variant should be carefully studied in vitro or in vivo. In this study, we focused on the deletion of exons 8-50 as a potential reading-frame restoration option. Using the CRISPR-Cas9 tool, we created the novel mouse model DMDdel8-50, which has an in-frame deletion in the DMD gene. We compared DMDdel8-50 mice to C57Bl6/CBA background control mice and previously generated DMDdel8-34 KO mice. We discovered that the shortened protein was expressed and correctly localized on the sarcolemma. The truncated protein, on the other hand, was unable to function like a full-length dystrophin and prevent disease progression. On the basis of protein expression, histological examination, and physical assessment of the mice, we concluded that the deletion of exons 8-50 is an exception to the reading-frame rule.


Subject(s)
Dystrophin , Muscular Dystrophy, Duchenne , Mice , Animals , Dystrophin/genetics , Mice, Inbred CBA , Muscular Dystrophy, Duchenne/metabolism , Phenotype , Exons/genetics , Gene Deletion
5.
PLoS One ; 18(3): e0282689, 2023.
Article in English | MEDLINE | ID: mdl-36928239

ABSTRACT

AlphaFold changed the field of structural biology by achieving three-dimensional (3D) structure prediction from protein sequence at experimental quality. The astounding success even led to claims that the protein folding problem is "solved". However, protein folding problem is more than just structure prediction from sequence. Presently, it is unknown if the AlphaFold-triggered revolution could help to solve other problems related to protein folding. Here we assay the ability of AlphaFold to predict the impact of single mutations on protein stability (ΔΔG) and function. To study the question we extracted the pLDDT and metrics from AlphaFold predictions before and after single mutation in a protein and correlated the predicted change with the experimentally known ΔΔG values. Additionally, we correlated the same AlphaFold pLDDT metrics with the impact of a single mutation on structure using a large scale dataset of single mutations in GFP with the experimentally assayed levels of fluorescence. We found a very weak or no correlation between AlphaFold output metrics and change of protein stability or fluorescence. Our results imply that AlphaFold may not be immediately applied to other problems or applications in protein folding.


Subject(s)
Protein Folding , Proteins , Proteins/chemistry , Mutation , Amino Acid Sequence , Protein Stability
6.
Bioinformatics ; 38(18): 4312-4320, 2022 09 15.
Article in English | MEDLINE | ID: mdl-35894930

ABSTRACT

MOTIVATION: Prediction of protein stability change upon mutation (ΔΔG) is crucial for facilitating protein engineering and understanding of protein folding principles. Robust prediction of protein folding free energy change requires the knowledge of protein three-dimensional (3D) structure. In case, protein 3D structure is not available, one can predict the structure from protein sequence; however, the perspectives of ΔΔG predictions for predicted protein structures are unknown. The accuracy of using 3D structures of the best templates for the ΔΔG prediction is also unclear. RESULTS: To investigate these questions, we used a representative set of seven diverse and accurate publicly available tools (FoldX, Eris, Rosetta, DDGun, ACDC-NN, ThermoNet and DynaMut) for stability change prediction combined with AlphaFold or I-Tasser for protein 3D structure prediction. We found that best templates perform consistently better than (or similar to) homology models for all ΔΔG predictors. Our findings imply using the best template structure for the prediction of protein stability change upon mutation if the protein 3D structure is not available. AVAILABILITY AND IMPLEMENTATION: The data are available at https://github.com/ivankovlab/template-vs-model. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Proteins , Protein Stability , Proteins/genetics , Proteins/chemistry , Mutation , Protein Folding
7.
Biophys Rev ; 14(6): 1255-1272, 2022 Dec.
Article in English | MEDLINE | ID: mdl-36659994

ABSTRACT

The ability of protein chains to spontaneously form their three-dimensional structures is a long-standing mystery in molecular biology. The most conceptual aspect of this mystery is how the protein chain can find its native, "working" spatial structure (which, for not too big protein chains, corresponds to the global free energy minimum) in a biologically reasonable time, without exhaustive enumeration of all possible conformations, which would take billions of years. This is the so-called "Levinthal's paradox." In this review, we discuss the key ideas and discoveries leading to the current understanding of protein folding kinetics, including folding landscapes and funnels, free energy barriers at the folding/unfolding pathways, and the solution of Levinthal's paradox. A special role here is played by the "all-or-none" phase transition occurring at protein folding and unfolding and by the point of thermodynamic (and kinetic) equilibrium between the "native" and the "unfolded" phases of the protein chain (where the theory obtains the simplest form). The modern theory provides an understanding of key features of protein folding and, in good agreement with experiments, it (i) outlines the chain length-dependent range of protein folding times, (ii) predicts the observed maximal size of "foldable" proteins and domains. Besides, it predicts the maximal size of proteins and domains that fold under solely thermodynamic (rather than kinetic) control. Complementarily, a theoretical analysis of the number of possible protein folding patterns, performed at the level of formation and assembly of secondary structures, correctly outlines the upper limit of protein folding times.

8.
Glycobiology ; 31(8): 959-974, 2021 09 09.
Article in English | MEDLINE | ID: mdl-33978736

ABSTRACT

Elevated plasma levels of hyaluronic acid (HA) is a disease marker in liver pathology and other inflammatory disorders. Inhibition of HA synthesis with coumarin 4-methylumbelliferone (4MU) has a beneficial effect in animal models of fibrosis, inflammation, cancer and metabolic syndrome. 4MU is an active compound of approved choleretic drug hymecromone with low bioavailability and a broad spectrum of action. New, more specific and efficient inhibitors of hyaluronan synthases (HAS) are required. We have tested several newly synthesized coumarin compounds and commercial chitin synthesis inhibitors to inhibit HA production in cell culture assay. Coumarin derivative compound VII (10'-methyl-6'-phenyl-3'H-spiro[piperidine-4,2'-pyrano[3,2-g]chromene]-4',8'-dione) demonstrated inhibition of HA secretion by NIH3T3 cells with the half-maximal inhibitory concentration (IC50) = 1.69 ± 0.75 µΜ superior to 4MU (IC50 = 8.68 ± 1.6 µΜ). Inhibitors of chitin synthesis, etoxazole, buprofezin, triflumuron, reduced HA deposition with IC50 of 4.21 ± 3.82 µΜ, 1.24 ± 0.87 µΜ and 1.48 ± 1.44 µΜ, respectively. Etoxazole reduced HA production and prevented collagen fibre formation in the CCl4 liver fibrosis model in mice similar to 4MU. Bioinformatics analysis revealed homology between chitin synthases and HAS enzymes, particularly in the pore-forming domain, containing the proposed site for etoxazole binding.


Subject(s)
Hyaluronic Acid , Hymecromone , Animals , Chitin , Hyaluronan Synthases/metabolism , Hyaluronic Acid/metabolism , Hymecromone/pharmacology , Mice , NIH 3T3 Cells
9.
Biomolecules ; 10(2)2020 02 06.
Article in English | MEDLINE | ID: mdl-32041303

ABSTRACT

"How do proteins fold?" Researchers have been studying different aspects of this question for more than 50 years. The most conceptual aspect of the problem is how protein can find the global free energy minimum in a biologically reasonable time, without exhaustive enumeration of all possible conformations, the so-called "Levinthal's paradox." Less conceptual but still critical are aspects about factors defining folding times of particular proteins and about perspectives of machine learning for their prediction. We will discuss in this review the key ideas and discoveries leading to the current understanding of folding kinetics, including the solution of Levinthal's paradox, as well as the current state of the art in the prediction of protein folding times.


Subject(s)
Protein Folding , Proteins/chemistry , Proteins/metabolism , Entropy , Kinetics , Protein Conformation , Thermodynamics
10.
Bioinformatics ; 2019 Nov 19.
Article in English | MEDLINE | ID: mdl-31742320

ABSTRACT

MOTIVATION: Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a "combinatorially complete dataset". So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. RESULTS: We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. AVAILABILITY: https://github.com/ivankovlab/HypercubeME.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
PLoS Genet ; 15(4): e1008079, 2019 04.
Article in English | MEDLINE | ID: mdl-30969963

ABSTRACT

Characterizing the fitness landscape, a representation of fitness for a large set of genotypes, is key to understanding how genetic information is interpreted to create functional organisms. Here we determined the evolutionarily-relevant segment of the fitness landscape of His3, a gene coding for an enzyme in the histidine synthesis pathway, focusing on combinations of amino acid states found at orthologous sites of extant species. Just 15% of amino acids found in yeast His3 orthologues were always neutral while the impact on fitness of the remaining 85% depended on the genetic background. Furthermore, at 67% of sites, amino acid replacements were under sign epistasis, having both strongly positive and negative effect in different genetic backgrounds. 46% of sites were under reciprocal sign epistasis. The fitness impact of amino acid replacements was influenced by only a few genetic backgrounds but involved interaction of multiple sites, shaping a rugged fitness landscape in which many of the shortest paths between highly fit genotypes are inaccessible.


Subject(s)
Evolution, Molecular , Fungal Proteins/genetics , Fungal Proteins/metabolism , Genetic Fitness , Yeasts/genetics , Yeasts/metabolism , Amino Acid Sequence , Amino Acid Substitution , Amino Acids/genetics , Amino Acids/metabolism , Epistasis, Genetic , Fungal Proteins/chemistry , Genes, Fungal , Genotype , Hydro-Lyases/chemistry , Hydro-Lyases/genetics , Hydro-Lyases/metabolism , Models, Genetic , Models, Molecular , Phylogeny , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism
12.
Bioinformatics ; 34(21): 3653-3658, 2018 11 01.
Article in English | MEDLINE | ID: mdl-29722803

ABSTRACT

Motivation: Computational prediction of the effect of mutations on protein stability is used by researchers in many fields. The utility of the prediction methods is affected by their accuracy and bias. Bias, a systematic shift of the predicted change of stability, has been noted as an issue for several methods, but has not been investigated systematically. Presence of the bias may lead to misleading results especially when exploring the effects of combination of different mutations. Results: Here we use a protocol to measure the bias as a function of the number of introduced mutations. It is based on a self-consistency test of the reciprocity the effect of a mutation. An advantage of the used approach is that it relies solely on crystal structures without experimentally measured stability values. We applied the protocol to four popular algorithms predicting change of protein stability upon mutation, FoldX, Eris, Rosetta and I-Mutant, and found an inherent bias. For one program, FoldX, we manage to substantially reduce the bias using additional relaxation by Modeller. Authors using algorithms for predicting effects of mutations should be aware of the bias described here. Availability and implementation: All calculations were implemented by in-house PERL scripts. Supplementary information: Supplementary data are available at Bioinformatics online. Note: The article 10.1093/bioinformatics/bty348, published alongside this paper, also addresses the problem of biases in protein stability change predictions.


Subject(s)
Proteins/genetics , Software , Algorithms , Bias , Mutation , Protein Stability
13.
PLoS One ; 12(8): e0182525, 2017.
Article in English | MEDLINE | ID: mdl-28800638

ABSTRACT

In the course of evolution, genes traverse the nucleotide sequence space, which translates to a trajectory of changes in the protein sequence in protein sequence space. The correspondence between regions of the nucleotide and protein sequence spaces is understood in general but not in detail. One of the unexplored questions is how many sequences a protein can reach with a certain number of nucleotide substitutions in its gene sequence. Here I propose an algorithm to calculate the volume of protein sequence space accessible to a given protein sequence as a function of the number of nucleotide substitutions made in the protein-coding sequence. The algorithm utilizes the power of the dynamic programming approach, and makes all calculations within a couple of seconds on a desktop computer. I apply the algorithm to green fluorescence protein, and get the number of sequences four times higher than estimated before. However, taking into account the astronomically huge size of the protein sequence space, the previous estimate can be considered as acceptable as an order of magnitude estimation. The proposed algorithm has practical applications in the study of evolutionary trajectories in sequence space.


Subject(s)
Computational Biology/methods , Nucleotides/genetics , Algorithms , Amino Acid Sequence , Base Sequence , Codon/genetics , Green Fluorescent Proteins/chemistry , Serine/genetics
14.
Phys Life Rev ; 21: 56-71, 2017 07.
Article in English | MEDLINE | ID: mdl-28190683

ABSTRACT

The ability of protein chains to spontaneously form their spatial structures is a long-standing puzzle in molecular biology. Experimentally measured folding times of single-domain globular proteins range from microseconds to hours: the difference (10-11 orders of magnitude) is the same as that between the life span of a mosquito and the age of the universe. This review describes physical theories of rates of overcoming the free-energy barrier separating the natively folded (N) and unfolded (U) states of protein chains in both directions: "U-to-N" and "N-to-U". In the theory of protein folding rates a special role is played by the point of thermodynamic (and kinetic) equilibrium between the native and unfolded state of the chain; here, the theory obtains the simplest form. Paradoxically, a theoretical estimate of the folding time is easier to get from consideration of protein unfolding (the "N-to-U" transition) rather than folding, because it is easier to outline a good unfolding pathway of any structure than a good folding pathway that leads to the stable fold, which is yet unknown to the folding protein chain. And since the rates of direct and reverse reactions are equal at the equilibrium point (as follows from the physical "detailed balance" principle), the estimated folding time can be derived from the estimated unfolding time. Theoretical analysis of the "N-to-U" transition outlines the range of protein folding rates in a good agreement with experiment. Theoretical analysis of folding (the "U-to-N" transition), performed at the level of formation and assembly of protein secondary structures, outlines the upper limit of protein folding times (i.e., of the time of search for the most stable fold). Both theories come to essentially the same results; this is not a surprise, because they describe overcoming one and the same free-energy barrier, although the way to the top of this barrier from the side of the unfolded state is very different from the way from the side of the native state; and both theories agree with experiment. In addition, they predict the maximal size of protein domains that fold under solely thermodynamic (rather than kinetic) control and explain the observed maximal size of the "foldable" protein domains.


Subject(s)
Protein Folding , Proteins/chemistry , Models, Molecular
15.
Nature ; 533(7603): 397-401, 2016 05 19.
Article in English | MEDLINE | ID: mdl-27193686

ABSTRACT

Fitness landscapes depict how genotypes manifest at the phenotypic level and form the basis of our understanding of many areas of biology, yet their properties remain elusive. Previous studies have analysed specific genes, often using their function as a proxy for fitness, experimentally assessing the effect on function of single mutations and their combinations in a specific sequence or in different sequences. However, systematic high-throughput studies of the local fitness landscape of an entire protein have not yet been reported. Here we visualize an extensive region of the local fitness landscape of the green fluorescent protein from Aequorea victoria (avGFP) by measuring the native function (fluorescence) of tens of thousands of derivative genotypes of avGFP. We show that the fitness landscape of avGFP is narrow, with 3/4 of the derivatives with a single mutation showing reduced fluorescence and half of the derivatives with four mutations being completely non-fluorescent. The narrowness is enhanced by epistasis, which was detected in up to 30% of genotypes with multiple mutations and mostly occurred through the cumulative effect of slightly deleterious mutations causing a threshold-like decrease in protein stability and a concomitant loss of fluorescence. A model of orthologous sequence divergence spanning hundreds of millions of years predicted the extent of epistasis in our data, indicating congruence between the fitness landscape properties at the local and global scales. The characterization of the local fitness landscape of avGFP has important implications for several fields including molecular evolution, population genetics and protein design.


Subject(s)
Genetic Fitness , Green Fluorescent Proteins/genetics , Green Fluorescent Proteins/metabolism , Animals , Epistasis, Genetic , Evolution, Molecular , Fluorescence , Genetic Association Studies , Genotype , Hydrozoa/chemistry , Hydrozoa/genetics , Mutant Proteins/genetics , Mutant Proteins/metabolism , Mutation/genetics , Phenotype
16.
PLoS One ; 10(11): e0143166, 2015.
Article in English | MEDLINE | ID: mdl-26606303

ABSTRACT

The prediction of protein folding rates is a necessary step towards understanding the principles of protein folding. Due to the increasing amount of experimental data, numerous protein folding models and predictors of protein folding rates have been developed in the last decade. The problem has also attracted the attention of scientists from computational fields, which led to the publication of several machine learning-based models to predict the rate of protein folding. Some of them claim to predict the logarithm of protein folding rate with an accuracy greater than 90%. However, there are reasons to believe that such claims are exaggerated due to large fluctuations and overfitting of the estimates. When we confronted three selected published models with new data, we found a much lower predictive power than reported in the original publications. Overly optimistic predictive powers appear from violations of the basic principles of machine-learning. We highlight common misconceptions in the studies claiming excessive predictive power and propose to use learning curves as a safeguard against those mistakes. As an example, we show that the current amount of experimental data is insufficient to build a linear predictor of logarithms of folding rates based on protein amino acid composition.


Subject(s)
Machine Learning , Protein Folding , Proteins/chemistry , Reproducibility of Results
17.
Curr Opin Struct Biol ; 26: 104-12, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24981969

ABSTRACT

The study of molecular evolution is important because it reveals how protein functions emerge and evolve. Recently, several types of studies indicated that substitutions in molecular evolution occur in a compensatory manner, whereby the occurrence of a substitution depends on the amino acid residues at other sites. However, a molecular or structural basis behind the compensation often remains obscure. Here, we review studies on the interface of structural biology and molecular evolution that revealed novel aspects of compensatory evolution. In many cases structural studies benefit from evolutionary data while structural data often add a functional dimension to the study of molecular evolution.


Subject(s)
Evolution, Molecular , Proteins/chemistry , Proteins/metabolism , Animals , Base Sequence , Humans , Protein Engineering , Proteins/genetics
18.
Nucleic Acids Res ; 41(Web Server issue): W459-64, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23729472

ABSTRACT

Regulated intramembrane proteolysis (RIP) is a critical mechanism for intercellular communication and regulates the function of membrane proteins through sequential proteolysis. RIP typically starts with ectodomain shedding of membrane proteins by extracellular membrane-bound proteases followed by intramembrane proteolysis of the resulting membrane-tethered fragment. However, for the majority of RIP proteases the corresponding substrates and thus, their functions, remain unknown. Proteome-wide identification of RIP protease substrates is possible by mass spectrometry-based quantitative comparison of RIP substrates or their cleavage products between different biological states. However, this requires quantification of peptides from only the ectodomain or cytoplasmic domain. Current analysis software does not allow matching peptides to either domain. Here we present the QARIP (Quantitative Analysis of Regulated Intramembrane Proteolysis) web server which matches identified peptides to the protein transmembrane topology. QARIP allows determination of quantitative ratios separately for the topological domains (cytoplasmic, ectodomain) of a given protein and is thus a powerful tool for quality control, improvement of quantitative ratios and identification of novel substrates in proteomic RIP datasets. To our knowledge, the QARIP web server is the first tool directly addressing the phenomenon of RIP. The web server is available at http://webclu.bio.wzw.tum.de/qarip/. This website is free and open to all users and there is no login requirement.


Subject(s)
Membrane Proteins/metabolism , Software , Aspartic Acid Endopeptidases/metabolism , HEK293 Cells , Humans , Internet , Mass Spectrometry , Membrane Proteins/chemistry , Peptides/analysis , Protein Structure, Tertiary , Proteolysis , Proteomics
19.
Environ Microbiol ; 15(4): 983-90, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23556536

ABSTRACT

Over the last 5 years proteogenomics (using mass spectroscopy to identify proteins predicted from genomic sequences) has emerged as a promising approach to the high-throughput identification of protein N-termini, which remains a problem in genome annotation. Comparison of the experimentally determined N-termini with those predicted by sequence analysis tools allows identification of the signal peptides and therefore conclusions on the cytoplasmic or extracytoplasmic (periplasmic or extracellular) localization of the respective proteins. We present here the results of a proteogenomic study of the signal peptides in Escherichia coli K-12 and compare its results with the available experimental data and predictions by such software tools as SignalP and Phobius. A single proteogenomics experiment recovered more than a third of all signal peptides that had been experimentally determined during the past three decades and confirmed at least 31 additional signal peptides, mostly in the known exported proteins, which had been previously predicted but not validated. The filtering of putative signal peptides for the peptide length and the presence of an eight-residue hydrophobic patch and a typical signal peptidase cleavage site proved sufficient to eliminate the false-positive hits. Surprisingly, the results of this proteogenomics study, as well as a re-analysis of the E. coli genome with the latest version of SignalP program, show that the fraction of proteins containing signal peptides is only about 10%, or half of previous estimates.


Subject(s)
Escherichia coli K12/chemistry , Peptides/analysis , Protein Sorting Signals , Proteome , Base Sequence , Mass Spectrometry , Membrane Proteins/analysis , Peptide Mapping , Peptides/classification , Proteins/analysis , Proteins/classification , Sequence Analysis , Serine Endopeptidases/analysis , Software
20.
Proc Natl Acad Sci U S A ; 110(1): 147-50, 2013 Jan 02.
Article in English | MEDLINE | ID: mdl-23251035

ABSTRACT

The ability of protein chains to spontaneously form their spatial structures is a long-standing puzzle in molecular biology. Experimentally measured rates of spontaneous folding of single-domain globular proteins range from microseconds to hours: the difference (11 orders of magnitude) is akin to the difference between the life span of a mosquito and the age of the universe. Here, we show that physical theory with biological constraints outlines a "golden triangle" limiting the possible range of folding rates for single-domain globular proteins of various size and stability, and that the experimentally measured folding rates fall within this narrow triangle built without any adjustable parameters, filling it almost completely. In addition, the golden triangle predicts the maximal size of protein domains that fold under solely thermodynamic (rather than kinetic) control. It also predicts the maximal allowed size of the "foldable" protein domains, and the size of domains found in known protein structures is in a good agreement with this limit.


Subject(s)
Models, Biological , Models, Molecular , Protein Folding , Protein Structure, Tertiary/physiology , Proteins/metabolism , Biophysics , Thermodynamics
SELECTION OF CITATIONS
SEARCH DETAIL
...