RESUMO
The classification of amino acids has proven to be a useful tool for understanding the importance of sequence in protein function. The reduced amino acid alphabets are an example of these classifications, which, when built from physicochemical, structural and quantum characteristics of the amino acids, allow it to simplify the representation of the sequences, being useful in the modelling, design and understanding of proteins. So, an objective selection of amino acids properties is important, due classes formed in a reduced alphabet depend on the descriptors used for classification. In this research, based on a careful selection of descriptors for the 20 amino acids, through techniques such as the information content index and hierarchical cluster analysis with ties in proximity, 20,871,586 reduced amino acid alphabets were constructed. This large collection of reduced alphabets was been used to interpret alterations in the function of three proteins: N-carbamylase, Luciferase, and PI3K, caused by amino acid changes in their sequences. For this, the similar and different descriptors linked to these mutations were studied. Properties such as volume, hydrophobicity, charge and autocorrelation can be associated with variations in the behaviour of these proteins, while the frequency in specific secondary structures, the Gibbs free energy and some topological and quantum properties can be considered as the causes of preventing the deactivation of protein function. This work offers the most complete collection of reduced alphabets that promise to be a useful tool for the interpretation of alterations caused by amino acid mutations in the protein sequence.