ABSTRACT
The existence of encrypted fragments with antimicrobial activity in human proteins has been thoroughly demonstrated in the literature. Recently, algorithms for the large-scale identification of these segments in whole proteomes were developed, and the pervasiveness of this phenomenon was stated. These algorithms typically mine encrypted cationic and amphiphilic segments of proteins, which, when synthesized as individual polypeptide sequences, exert antimicrobial activity by membrane disruption. In the present report, the human reference proteome was submitted to the software kamal for the uncovering of protein segments that correspond to putative intragenic antimicrobial peptides (IAPs). The assessment of the identity of these segments, frequency, functional classes of parent proteins, structural relevance, and evolutionary conservation of amino acid residues within their corresponding proteins was conducted in silico. Additionally, the antimicrobial and anticancer activity of six selected synthetic peptides was evaluated. Our results indicate that cationic and amphiphilic segments can be found in 2% of all human proteins, but are more common in transmembrane and peripheral membrane proteins. These segments are surface-exposed basic patches whose amino acid residues present similar conservation scores to other residues with similar solvent accessibility. Moreover, the antimicrobial and anticancer activity of the synthetic putative IAP sequences was irrespective to whether these are associated to membranes in the cellular setting. Our study discusses these findings in light of the current understanding of encrypted peptide sequences, offering some insights into the relevance of these segments to the organism in the context of their harboring proteins or as separate polypeptide sequences.
Subject(s)
Anti-Infective Agents , Proteome , Humans , Proteome/genetics , Antimicrobial Cationic Peptides/genetics , Antimicrobial Cationic Peptides/pharmacology , Amino Acid Sequence , Amino AcidsABSTRACT
Random energy models (REMs) provide a simple description of the energy landscapes that guide protein folding and evolution. The requirement of a large energy gap between the native structure and unfolded conformations, considered necessary for cooperative, protein-like, folding behavior, indicates that proteins differ markedly from random heteropolymers. It has been suggested, therefore, that natural selection might have acted to choose nonrandom amino acid sequences satisfying this particular condition, implying that a large fraction of possible, unselected random sequences, would not fold to any structure. From an informational perspective, however, this scenario could indicate that protein structures, regarded as messages to be transmitted through a communication channel, would not be efficiently encoded in amino acid sequences, regarded as the communication channel for this transmission, since a large fraction of possible channel states would not be used. Here, we use a combined REM for conformations and sequences, with previously estimated parameters for natural proteins, to explore an alternative possibility in which the appropriate shape of the landscape results mainly from the deviation from randomness of possible native structures instead of sequences. We observe that this situation emerges naturally if the distribution of conformational energies happens to arise from two independent contributions corresponding to sequence-dependent and -independent terms. This construction is consistent with the hypothesis of a protein burial folding code, with native structures being determined by a modest amount of sequence-dependent atomic burial information with sequence-independent constraints imposed by unspecific hydrogen bond formation. More generally, an appropriate combination of sequence-dependent and -independent information accommodates the possibility of an efficient structural encoding with the main physical requirement for folding, providing possible insight not only on the folding process but also on several aspects sequence evolution such as neutral networks, conformational coverage, and de novo gene emergence.
Subject(s)
Protein Folding , Proteins , Protein Conformation , Thermodynamics , Models, Molecular , Proteins/genetics , Proteins/chemistryABSTRACT
The connection between protein sequences and tertiary structures has intrigued investigators for decades. A plausible hypothesis for the coding scheme postulates that atomic burial information obtainable from the sequence could be sufficient for structural determination when combined to sequence-independent constraints. Accordingly, folding simulations using native burial information expressed by atomic central distances, discretized into a small number L of equiprobable burial layers, have indeed been successful in reaching and distinguishing the native structure of several globular proteins. Attempted predictions of layers from sequence, however, turned out to be insufficiently accurate for most proteins. Here we explore the possibility that a nonuniform assignment of layers, which is intended to account for constraints imposed by chain connectivity, might provide a more efficient burial encoding of tertiary structures. We consider the condition that adjacent Cα-atoms along the sequence cannot occupy nonadjacent layers, in which case the information required to specify sequences of burials would be smaller. It is shown that appropriate folding behavior can still be observed in this explicitly more constrained scenario with a structure-dependent assignment intended to produce the thinnest possible layers still compatible with the imposed burial constraint. This thinnest assignment turns out to be sufficiently restrictive for the observed examples and provides appropriately thinner layers or, equivalently, a larger number of layers, for examples previously observed to indeed require more restrictive constraints when compared to counterparts of similar size, as well as the appropriate increase in number of layers for larger proteins. Implications for the general understanding of the protein folding code are discussed.
Subject(s)
Protein Folding , Proteins , Amino Acid Sequence , Burial , Models, Molecular , Protein Conformation , Proteins/chemistryABSTRACT
MOTIVATION: It has been recently suggested that atomic burials, as expressed by molecular central distances, contain sufficient information to determine the tertiary structure of small globular proteins. A possible approach to structural determination from sequence could therefore involve a sequence-to-burial intermediate prediction step whose accuracy, however, is theoretically limited by the mutual information between these two variables. We use a non-redundant set of globular protein structures to estimate the mutual information between local amino acid sequence and atomic burials. Discretizing central distances of or atoms in equiprobable burial levels, we estimate relevant mutual information measures that are compared with actual predictions obtained from a Naive Bayesian Classifier (NBC) and a Hidden Markov Model (HMM). RESULTS: Mutual information density for 20 amino acids and two or three burial levels were estimated to be roughly 15% of the unconditional burial entropy density. Lower estimates for the mutual information between local amino acid sequence and burial of a single residue indicated an increase in mutual information with the number of burial levels up to at least five or six levels. Prediction schemes were found to efficiently extract the available burial information from local sequence. Lower estimates for the mutual information involving single burials are consistently approached by predictions from the NBC and actually surpassed by predictions from the HMM. Near-optimal prediction for the HMM is indicated by the agreement between its density of prediction information and the corresponding density of mutual information between input and output representations. AVAILABILITY: The dataset of protein structures and the prediction implementations are available at http://www.btc.unb.br/ (in 'Software').
Subject(s)
Models, Molecular , Models, Statistical , Proteins/chemistry , Algorithms , Amino Acid Sequence , Bayes Theorem , Entropy , Markov Chains , Protein Structure, Tertiary , SoftwareABSTRACT
Protein tertiary structures are known to be encoded in amino acid sequences, but the problem of structure prediction from sequence continues to be a challenge. With this question in mind, recent simulations have shown that atomic burials, as expressed by atom distances to the molecular geometrical center, are sufficiently informative for determining native conformations of small globular proteins. Here we use a simple computational experiment to estimate the amount of this required burial information and find it to be surprisingly small, actually comparable with the stringent limit imposed by sequence statistics. Atomic burials appear to satisfy, therefore, minimal requirements for a putative dominating property in the folding code because they provide an amount of information sufficiently large for structural determination but, at the same time, sufficiently small to be encodable in sequences. In a simple analogy with human communication, atomic burials could correspond to the actual "language" encoded in the amino acid "script" from which the complexity of native conformations is recovered during the folding process.
Subject(s)
Amino Acids/chemistry , Computational Biology/methods , Models, Chemical , Models, Molecular , Protein Conformation , Protein Folding , Proteins/chemistry , Computer SimulationABSTRACT
We investigate the possibility that atomic burials, as measured by their distances from the structural geometrical center, contain sufficient information to determine the tertiary structure of globular proteins. We report Monte Carlo simulated annealing results of all-atom hard-sphere models in continuous space for four small proteins: the all-beta WW-domain 1E0L, the alpha/beta protein-G 1IGD, the all-alpha engrailed homeo-domain 1ENH, and the alpha + beta engineered monomeric form of the Cro protein 1ORC. We used as energy function the sum over all atoms, labeled by i, of |R(i) - R(i) (*)|, where R(i) is the atomic distance from the center of coordinates, or central distance, and R(i) (*) is the "ideal" central distance obtained from the native structure. Hydrogen bonds were taken into consideration by the assignment of two ideal distances for backbone atoms forming hydrogen bonds in the native structure depending on the formation of a geometrically defined bond, independently of bond partner. Lowest energy final conformations turned out to be very similar to the native structure for the four proteins under investigation and a strong correlation was observed between energy and distance root mean square deviation (DRMS) from the native in the case of all-beta 1E0L and alpha/beta 1IGD. For all alpha 1ENH and alpha + beta 1ORC the overall correlation between energy and DRMS among final conformations was not as high because some trajectories resulted in high DRMS but low energy final conformations in which alpha-helices adopted a non-native mutual orientation. Comparison between central distances and actual accessible surface areas corroborated the implicit assumption of correlation between these two quantities. The Z-score obtained with this native-centric potential in the discrimination of native 1ORC from a set of random compact structures confirmed that it contains a much smaller amount of native information when compared to a traditional contact Go potential but indicated that simple sequence-dependent burial potentials still need some improvement in order to attain a similar discriminability. Taken together, our results suggest that central distances, in conjunction to physically motivated hydrogen bond constraints, contain sufficient information to determine the native conformation of these small proteins and that a solution to the folding problem for globular proteins could arise from sufficiently accurate burial predictions from sequence followed by minimization of a burial-dependent energy function.
Subject(s)
Algorithms , Protein Structure, Tertiary , Hydrogen Bonding , Models, Molecular , Monte Carlo Method , Protein Folding , Proteins/chemistry , ThermodynamicsABSTRACT
We use complete enumeration of self-avoiding chains of up to N=26 monomers in two-dimensional lattices to investigate the effect of alternative implementations of backbone hydrogen bonds on the cooperativity of homopolypeptide collapse. Following a recent study on protein folding models, we use the square lattice with z=3 local conformations per monomer and lattice extensions containing diagonal steps which result in z=5 or z=7 and assume that only a subset of zhSubject(s)
Crystallization/methods
, Models, Chemical
, Models, Molecular
, Multiprotein Complexes/chemistry
, Multiprotein Complexes/ultrastructure
, Peptides/chemistry
, Binding Sites
, Computer Simulation
, Hydrogen Bonding
, Protein Binding
, Protein Conformation
ABSTRACT
We perform a statistical analysis of atomic distributions as a function of the distance R from the molecular geometrical center in a nonredundant set of compact globular proteins. The number of atoms increases quadratically for small R, indicating a constant average density inside the core, reaches a maximum at a size-dependent distance R(max), and falls rapidly for larger R. The empirical curves turn out to be consistent with the volume increase of spherical concentric solid shells and a Fermi-Dirac distribution in which the distance R plays the role of an effective atomic energy epsilon(R) = R. The effective chemical potential mu governing the distribution increases with the number of residues, reflecting the size of the protein globule, while the temperature parameter beta decreases. Interestingly, betamu is not as strongly dependent on protein size and appears to be tuned to maintain approximately half of the atoms in the high density interior and the other half in the exterior region of rapidly decreasing density. A normalized size-independent distribution was obtained for the atomic probability as a function of the reduced distance, r = R/R(g), where R(g) is the radius of gyration. The global normalized Fermi distribution, F(r), can be reasonably decomposed in Fermi-like subdistributions for different atomic types tau, F(tau)(r), with Sigma(tau)F(tau)(r) = F(r), which depend on two additional parameters mu(tau) and h(tau). The chemical potential mu(tau) affects a scaling prefactor and depends on the overall frequency of the corresponding atomic type, while the maximum position of the subdistribution is determined by h(tau), which appears in a type-dependent atomic effective energy, epsilon(tau)(r) = h(tau)r, and is strongly correlated to available hydrophobicity scales. Better adjustments are obtained when the effective energy is not assumed to be necessarily linear, or epsilon(tau)*(r) = h(tau)*r(alpha,), in which case a correlation with hydrophobicity scales is found for the product alpha(tau)h(tau)*. These results indicate that compact globular proteins are consistent with a thermodynamic system governed by hydrophobic-like energy functions, with reduced distances from the geometrical center, reflecting atomic burials, and provide a conceptual framework for the eventual prediction from sequence of a few parameters from which whole atomic probability distributions and potentials of mean force can be reconstructed.
Subject(s)
Protein Conformation , Amino Acids/chemistry , Hydrophobic and Hydrophilic Interactions , Models, Molecular , ProbabilityABSTRACT
Conformational restrictions imposed by hydrogen bond formation during protein folding are investigated by Monte Carlo simulations of a non-native-centric, two-dimensional, hydrophobic model in which the formation of favorable contacts is coupled to an effective reduction in lattice coordination. This scheme is intended to mimic the requirement that polar backbone groups of real proteins must form hydrogen bonds concomitantly to their burial inside the apolar protein core. In addition to the square lattice, with z=3 conformations per monomer, we use extensions in which diagonal step vectors are allowed, resulting in z=5 and z=7. Thermodynamics are governed by the hydrophobic energy function, according to which hydrophobic monomers tend to make contacts unspecifically while the reverse is true for hydrophilic monomers, with the additional restriction that only contacts between monomers adopting one of zhSubject(s)
Models, Chemical
, Models, Molecular
, Proteins/chemistry
, Proteins/ultrastructure
, Sequence Analysis, Protein/methods
, Amino Acid Sequence
, Computer Simulation
, Energy Transfer
, Entropy
, Hydrogen Bonding
, Hydrophobic and Hydrophilic Interactions
, Molecular Sequence Data
, Protein Conformation
, Protein Folding
, Structure-Activity Relationship
, Thermodynamics
ABSTRACT
By Monte Carlo simulations, we explored the effect of single mutations on the thermodynamics and kinetics of the folding of a two-dimensional, energetically frustrated, hydrophobic protein model. Phi-Value analysis, corroborated by simulations beginning from given sets of judiciously chosen initial contacts, suggests that the transition state of the model consists of a limited region of the native structure, that is, a folding nucleus. It seems that the most important contacts in the transition state (large and positive Phi) are not the ones with the highest contact order, because in this case the entropic cost of their formation would be too high, but exactly the ones that decrease the entropic cost of difficult contacts, reducing their effective contact order. Mutations of internal monomers involved in high-order contacts were actually the ones resulting in the fastest kinetics (and Phi < 0), indicating they tend to make low order, non-native contacts of low entropic cost that stabilize the unfolded state with respect to the transition state. Folding acceleration by other non-native interactions was also observed and a simple general mechanism is proposed according to which non-native contacts can act indirectly over the folding nucleus, "chelating" out potentially harmful contacts. The polymer graph of our model, which facilitates the visualization of effective contact orders, successfully suggests the relative kinetic importance of different contacts and is reasonably consistent with analogous graphs for the well characterized family of SH3 domains.