Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 55
Filter
1.
Proteins ; 2024 May 29.
Article in English | MEDLINE | ID: mdl-38808365

ABSTRACT

We apply methods of Artificial Intelligence and Machine Learning to protein dynamic bioinformatics. We rewrite the sequences of a large protein data set, containing both folded and intrinsically disordered molecules, using a representation developed previously, which encodes the intrinsic dynamic properties of the naturally occurring amino acids. We Fourier analyze the resulting sequences. It is demonstrated that classification models built using several different supervised learning methods are able to successfully distinguish folded from intrinsically disordered proteins from sequence alone. It is further shown that the most important sequence property for this discrimination is the sequence mobility, which is the sequence averaged value of the residue-specific average alpha carbon B factor. This is in agreement with previous work, in which we have demonstrated the central role played by the sequence mobility in protein dynamic bioinformatics and biophysics. This finding opens a path to the application of dynamic bioinformatics, in combination with machine learning algorithms, to a range of significant biomedical problems.

2.
J Phys Chem B ; 127(27): 6073-6077, 2023 07 13.
Article in English | MEDLINE | ID: mdl-37368985

ABSTRACT

Using tools developed to study the dynamic bioinformatics of proteins, we are able to study the dynamic characteristics of very large numbers of protein sequences simultaneously. We study herein the distribution of protein sequences in a space determined by sequence mobility. It is shown that there are statistically significant differences in mobility distribution between folded sequences of different structural classes and between those and sequences of intrinsically disordered proteins. It is also shown that the several regions of mobility space differ significantly with respect to structural makeup. Helical proteins are shown to have distinctive dynamic characteristics at both extremes of the mobility spectrum.


Subject(s)
Intrinsically Disordered Proteins , Intrinsically Disordered Proteins/chemistry , Amino Acid Sequence , Protein Conformation , Protein Folding
3.
J Phys Chem B ; 126(31): 5730-5734, 2022 08 11.
Article in English | MEDLINE | ID: mdl-35900129

ABSTRACT

Using recently developed methods for studying the bioinformatics of protein dynamics, we investigate differences in dynamic characteristics between the sequences of proteins that fall into different structural classes. It is shown that there is a clear differentiation of dynamic properties of sequences as a function of structural class. Taken together with previous results we have developed, the present work demonstrates that dynamic properties are associated with structural behavior in two ways. The determination as to whether a given sequence folds is governed by the long-length-scale organization of the sequence. If the sequence folds, the choice of architectural class is governed by short- and intermediate-length-scale organization.


Subject(s)
Computational Biology , Proteins , Proteins/chemistry
4.
Proteins ; 90(5): 1115-1118, 2022 05.
Article in English | MEDLINE | ID: mdl-34981860

ABSTRACT

We compare the sequences of folded and intrinsically disordered proteins (IDPs), using bioinformatic methods recently developed to study protein dynamic properties. We demonstrate that the two classes of sequences are organized in diametrically opposite ways with respect to long-length-scale dynamic properties. We further demonstrate a statistically significant difference between the amino acid compositions of folded and disordered proteins, which is expressed in dynamic properties. Our results indicate that the long-length-scale properties of sequences are critical in determining whether proteins are able to fold, and, more generally, that they are central to an understanding of protein physics. They further provide a physical basis for the empirically observed differences in amino acid composition between folded and IDPs.


Subject(s)
Intrinsically Disordered Proteins , Protein Folding , Amino Acids , Computational Biology , Intrinsically Disordered Proteins/chemistry , Protein Conformation
5.
Biopolymers ; 112(10): e23411, 2021 Oct.
Article in English | MEDLINE | ID: mdl-33270217

ABSTRACT

Using bioinformatic methods for treating protein dynamics, developed in earlier work, we study the relationship between sequence mobility and dynamics in proteins. It is shown that sequence mobility drives a transition between two dynamic regimes in proteins, and that the specific details of this transition differ qualitatively between α-helical proteins and those in other structural classes. We examine the possibility that conformational switching is related to dynamic switching, by considering a specific system of sequences which exhibit the switching phenomenon. It is shown that a relationship between dynamic and conformational switching is entirely plausible.


Subject(s)
Computational Biology , Proteins , Protein Conformation , Protein Structure, Secondary
6.
Proc Natl Acad Sci U S A ; 117(33): 19938-19942, 2020 08 18.
Article in English | MEDLINE | ID: mdl-32759212

ABSTRACT

We use a bioinformatic description of amino acid dynamic properties, based on residue-specific average B factors, to construct a dynamics-based, large-scale description of a space of protein sequences. We examine the relationship between that space and an independently constructed, structure-based space comprising the same sequences. It is demonstrated that structure and dynamics are only moderately correlated. It is further shown that helical proteins fall into two classes with very different structure-dynamics relationships. We suggest that dynamics in the two helical classes are dominated by distinctly different modes--pseudo-one-dimensional, localized helical modes in one case, and pseudo-three-dimensional (3D) global modes in the other. Sheet/barrel and mixed-α/ß proteins exhibit more conventional structure-dynamics relationships. It is found that the strongest correlation between structure and dynamic properties arises when the latter are represented by the sequence average of the dynamic index, which corresponds physically to the overall mobility of the protein. None of these results are accessible to bioinformatic methods hitherto available.


Subject(s)
Proteins/chemistry , Computational Biology , Protein Structure, Secondary
7.
Proteins ; 87(10): 799-804, 2019 10.
Article in English | MEDLINE | ID: mdl-31134683

ABSTRACT

We examine the local and global properties of the average B-factor, 〈B〉, as a residue-specific indicator of protein dynamic characteristics. It has been shown that values of 〈B〉 for the 20 amino acids differ in a statistically significant manner, and that, while strongly determined by the static physical properties of amino acids, they also encode averaged information about the influence of global fold on single-residue dynamics. Therefore, complete sequences of amino acids also encode fold-related global dynamic information, in addition to the local information that arises from static physical properties. We show that the relative magnitudes of these two contributions can be determined using Fourier methods, which represent the global properties of the sequences. It has also been shown that the behavior of Fourier components of 〈B〉 differs, with very high statistical significance, between structural groups, and that this information is not available from a comparable analysis of static amino acid properties.


Subject(s)
Algorithms , Amino Acids/chemistry , Computational Biology/methods , Proteins/chemistry , Sequence Analysis, Protein/methods , Amino Acids/analysis , Humans , Protein Conformation , Protein Domains , Protein Folding , Proteins/analysis
8.
Methods Mol Biol ; 1958: 341-346, 2019.
Article in English | MEDLINE | ID: mdl-30945228

ABSTRACT

Traditional approaches to sequence alignment are based on evolutionary ideas. As a result, they are prebiased toward results which are in accord with initial expectations. We present here a method of sequence alignment which is based entirely on the physical properties of the amino acids. This approach has no inherent bias, eliminates much of the computational complexity associated with methods currently in use, and has been shown to give good results for structures which were poorly predicted by traditional methods in recent CASP competitions and to identify sequence differences which correlate with structural and dynamic differences not detectable by traditional methods.


Subject(s)
Amino Acid Motifs , Computational Biology/methods , Proteins/genetics , Sequence Alignment/methods , Algorithms , Amino Acid Sequence/genetics , Physics , Proteins/chemistry , Sequence Homology, Amino Acid
9.
Proc Natl Acad Sci U S A ; 114(7): 1578-1583, 2017 02 14.
Article in English | MEDLINE | ID: mdl-28143938

ABSTRACT

We recently introduced a physically based approach to sequence comparison, the property factor method (PFM). In the present work, we apply the PFM approach to the study of a challenging set of sequences-the bacterial chemotaxis protein CheY, the N-terminal receiver domain of the nitrogen regulation protein NT-NtrC, and the sporulation response regulator Spo0F. These are all response regulators involved in signal transduction. Despite functional similarity and structural homology, they exhibit low sequence identity. PFM sequence comparison demonstrates a statistically significant qualitative difference between the sequence of CheY and those of the other two proteins that is not found using conventional alignment methods. This difference is shown to be consonant with structural characteristics, using distance matrix comparisons. We also demonstrate that residues participating strongly in native contacts during unfolding are distributed differently in CheY than in the other two proteins. The PFM result is also in accord with dynamic simulation results of several types. Molecular dynamics simulations of all three proteins were carried out at several temperatures, and it is shown that the dynamics of CheY are predicted to differ from those of NT-NtrC and Spo0F. The predicted dynamic properties of the three proteins are in good agreement with experimentally determined B factors and with fluctuations predicted by the Gaussian network model. We pinpoint the differences between the PFM and traditional sequence comparisons and discuss the informatic basis for the ability of the PFM approach to detect physical differences between these sequences that are not apparent from traditional alignment-based comparison.


Subject(s)
Bacterial Proteins/genetics , Methyl-Accepting Chemotaxis Proteins/genetics , Sequence Alignment/methods , Signal Transduction/genetics , Amino Acid Sequence , Bacterial Proteins/chemistry , Bacterial Proteins/metabolism , Binding Sites/genetics , Computational Biology/methods , Methyl-Accepting Chemotaxis Proteins/chemistry , Methyl-Accepting Chemotaxis Proteins/metabolism , Models, Molecular , Protein Domains , Sequence Homology, Amino Acid
10.
Proc Natl Acad Sci U S A ; 113(7): 1808-10, 2016 Feb 16.
Article in English | MEDLINE | ID: mdl-26831093

ABSTRACT

The degree of informatic independence between the physical properties of amino acids as encoded in actual protein sequences is calculated. It is shown that no physical property can be identified that carries significantly less information than others and that the information overlap between different properties and different length scales along the sequence is essentially zero. These observations suggest that bioinformatic models based on arbitrarily selected sets of physical properties are inherently deficient.


Subject(s)
Computational Biology , Proteins/chemistry , Amino Acid Sequence , Fourier Analysis
11.
Proteins ; 83(11): 1923-8, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26315852

ABSTRACT

We examine the utility of informatic-based methods in computational protein biophysics. To do so, we use newly developed metric functions to define completely independent sequence and structure spaces for a large database of proteins. By investigating the relationship between these spaces, we demonstrate quantitatively the limits of knowledge-based correlation between the sequences and structures of proteins. It is shown that there are well-defined, nonlinear regions of protein space in which dissimilar structures map onto similar sequences (the conformational switch), and dissimilar sequences map onto similar structures (remote homology). These nonlinearities are shown to be quite common-almost half the proteins in our database fall into one or the other of these two regions. They are not anomalies, but rather intrinsic properties of structural encoding in amino acid sequences. It follows that extreme care must be exercised in using bioinformatic data as a basis for computational structure prediction. The implications of these results for protein evolution are examined.


Subject(s)
Amino Acid Sequence , Computational Biology/methods , Protein Conformation , Proteins/chemistry , Biophysical Phenomena , Databases, Protein , Fourier Analysis , Sequence Homology, Amino Acid
12.
Proc Natl Acad Sci U S A ; 112(16): 5029-32, 2015 Apr 21.
Article in English | MEDLINE | ID: mdl-25848034

ABSTRACT

The relationship between protein sequence and structure arises entirely from amino acid physical properties. An alternative method is therefore proposed to identify homologs in which residue equivalence is based exclusively on the pairwise physical property similarities of sequences. This approach, the property factor method (PFM), is entirely different from those in current use. A comparison is made between our method and PSI BLAST. We demonstrate that traditionally defined sequence similarity can be very low for pairs of sequences (which therefore cannot be identified using PSI BLAST), but similarity of physical property distributions results in almost identical 3D structures. The performance of PFM is shown to be better than that of PSI BLAST when sequence matching is comparable, based on a comparison using targets from CASP10 (89 targets) and CASP11 (51 targets). It is also shown that PFM outperforms PSI BLAST in informatically challenging targets.


Subject(s)
Computational Biology/methods , Physical Phenomena , Proteins/chemistry , Sequence Homology, Amino Acid , Amino Acid Sequence , Models, Molecular , Molecular Sequence Data
13.
Proc Natl Acad Sci U S A ; 111(14): 5225-9, 2014 Apr 08.
Article in English | MEDLINE | ID: mdl-24706836

ABSTRACT

We show that a Fourier-based sequence distance function is able to identify structural homologs of target sequences with high accuracy. It is shown that Fourier distances correlate very strongly with independently determined structural distances between molecules, a property of the method that is not attainable using conventional representations. It is further shown that the ability of the Fourier approach to identify protein folds is statistically far in excess of random expectation. It is then shown that, in actual searches for structural homologs of selected target sequences, the Fourier approach gives excellent results. On the basis of these results, we suggest that the global information detected by the Fourier representation is an essential feature of structure encoding in protein sequences and a key to structural homology detection.


Subject(s)
Proteins/chemistry , Protein Conformation , Protein Folding
14.
J Chem Theory Comput ; 9(10)2013 Oct 08.
Article in English | MEDLINE | ID: mdl-24273465

ABSTRACT

The UNited RESidue (UNRES) coarse-grained model of polypeptide chains, developed in our laboratory, enables us to carry out millisecond-scale molecular-dynamics simulations of large proteins effectively. It performs well in ab initio predictions of protein structure, as demonstrated in the last Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP10). However, the resolution of the simulated structure is too coarse, especially in loop regions, which results from insufficient specificity of the model of local interactions. To improve the representation of local interactions, in this work we introduced new side-chain-backbone correlation potentials, derived from a statistical analysis of loop regions of 4585 proteins. To obtain sufficient statistics, we reduced the set of amino-acid-residue types to five groups, derived in our earlier work on structurally optimized reduced alphabets, based on a statistical analysis of the properties of amino-acid structures. The new correlation potentials are expressed as one-dimensional Fourier series in the virtual-bond-dihedral angles involving side-chain centroids. The weight of these new terms was determined by a trial-and-error method, in which Multiplexed Replica Exchange Molecular Dynamics (MREMD) simulations were run on selected test proteins. The best average root-mean-square deviations (RMSDs) of the calculated structures from the experimental structures below the folding-transition temperatures were obtained with the weight of the new side-chain-backbone correlation potentials equal to 0.57. The resulting conformational ensembles were analyzed in detail by using the Weighted Histogram Analysis Method (WHAM) and Ward's minimum-variance clustering. This analysis showed that the RMSDs from the experimental structures dropped by 0.5 Å on average, compared to simulations without the new terms, and the deviation of individual residues in the loop region of the computed structures from their counterparts in the experimental structures (after optimum superposition of the calculated and experimental structure) decreased by up to 8 Å. Consequently, the new terms improve the representation of local structure.

15.
Proteins ; 81(10): 1681-5, 2013 Oct.
Article in English | MEDLINE | ID: mdl-23720385

ABSTRACT

Delineation of the relationship between sequence and structure in proteins has proven elusive. Most studies of this problem use alignment methods and other approaches based on the characteristics of individual residues. It is demonstrated herein that the sequence-structure relationship is determined in significant part by global characteristics of sequence organization. Information encoded in complete sequences is required to distinguish proteins in different architectural groups. It is found that the statistically significant differences between sequences encoding different architectures are encoded in a surprisingly small set of low-wave-number sequence periodicities. It would therefore appear that unexpected simplicity in an appropriately defined Fourier space may be an inherent characteristic of the sequences of folded proteins.


Subject(s)
Protein Conformation , Protein Folding , Proteins , Sequence Analysis, Protein/methods , Computational Biology/methods , Databases, Factual , Fourier Analysis , Proteins/chemistry , Proteins/metabolism
16.
Methods Mol Biol ; 932: 107-14, 2013.
Article in English | MEDLINE | ID: mdl-22987349

ABSTRACT

Analysis of the global properties of protein sequences, rather than single-site or local properties, has been shown to lead to new understanding of folding and function. Here we describe the use of software which can describe sequences numerically in an orthonormal fashion, Fourier-analyze those sequences, and verify the statistical significance of the resulting Fourier coefficients. The resulting parameters can be used to study problems involving sequences from a unique perspective.


Subject(s)
Amino Acid Sequence , Proteins/chemistry , Amino Acid Motifs , Computational Biology/methods , Software
17.
Phys Rev Lett ; 106(24): 248101, 2011 Jun 17.
Article in English | MEDLINE | ID: mdl-21770602

ABSTRACT

The existence of conformational switching in proteins, induced by single amino acid mutations, presents an important challenge to our understanding of the physics of protein folding. Sequence-local methods, commonly used to detect structural homology, are incapable of accounting for this phenomenon. We examine a set of proteins, derived from the G(A) and G(B) domains of Streptococcus protein G, which are known to show a dramatic conformational change as a result of single-residue replacement. It is shown that these sequences, which are almost identical locally, can have very different global patterns of physical properties. These differences are consistent with the observed complete change in conformation. These results suggest that sequence-local methods for identifying structural homology can be misleading. They point to the importance of global sequence analysis in understanding sequence-structure relationships.


Subject(s)
Bacterial Proteins/chemistry , Spectrum Analysis/methods , Fourier Analysis , Protein Structure, Tertiary
19.
Proc Natl Acad Sci U S A ; 107(19): 8623-6, 2010 May 11.
Article in English | MEDLINE | ID: mdl-20421501

ABSTRACT

Computational studies of the relationships between protein sequence, structure, and folding have traditionally relied on purely local sequence representations. Here we show that global representations, on the basis of parameters that encode information about complete sequences, contain otherwise inaccessible information about the organization of sequences. By studying the spectral properties of these parameters, we demonstrate that amino acid physical properties fall into two distinct classes. One class is comprised of properties that favor sequentially localized interaction clusters. The other class is comprised of properties that favor globally distributed interactions. This observation provides a bridge between two classic models of protein folding-the collapse model and the nucleation model-and provides a basis for understanding how any degree of intermediacy between these two extremes can occur.


Subject(s)
Proteins/chemistry , Sequence Analysis, Protein , Amino Acid Sequence , Molecular Sequence Data
20.
Proc Natl Acad Sci U S A ; 106(34): 14345-8, 2009 Aug 25.
Article in English | MEDLINE | ID: mdl-19706520

ABSTRACT

It is demonstrated that, properly represented, the amino acid composition of protein sequences contains the information necessary to delineate the global properties of protein structure space. A numerical representation of amino acid sequence in terms of a set of property factors is used, and the values of those property factors are averaged over individual sequences and then over sets of sequences belonging to structurally defined groups. These sequence sets then can be viewed as points in a 10-dimensional space, and the organization of that space, determined only by sequence properties, is similar at both local and global scales to that of the space of protein structures determined previously.


Subject(s)
Algorithms , Proteins/chemistry , Computer Simulation , Databases, Protein , Physical Phenomena , Protein Conformation , Sequence Analysis, Protein
SELECTION OF CITATIONS
SEARCH DETAIL
...