Pesquisa | Portal Regional da BVS

Identification of amino acid propensities that are strong determinants of linear B-cell epitope using neural networks.

Su, Chun-Hung; Pal, Nikhil R; Lin, Ken-Li; Chung, I-Fang.

PLoS One ; 7(2): e30617, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22347389

RESUMO

BACKGROUND: Identification of amino acid propensities that are strong determinants of linear B-cell epitope is very important to enrich our knowledge about epitopes. This can also help to obtain better epitope prediction. Typical linear B-cell epitope prediction methods combine various propensities in different ways to improve prediction accuracies. However, fewer but better features may yield better prediction. Moreover, for a propensity, when the sequence length is k, there will be k values, which should be treated as a single unit for feature selection and hence usual feature selection method will not work. Here we use a novel Group Feature Selecting Multilayered Perceptron, GFSMLP, which treats a group of related information as a single entity and selects useful propensities related to linear B-cell epitopes, and uses them to predict epitopes. METHODOLOGY/ PRINCIPAL FINDINGS: We use eight widely known propensities and four data sets. We use GFSMLP to rank propensities by the frequency with which they are selected. We find that Chou's beta-turn and Ponnuswamy's polarity are better features for prediction of linear B-cell epitope. We examine the individual and combined discriminating power of the selected propensities and analyze the correlation between paired propensities. Our results show that the selected propensities are indeed good features, which also cooperate with other propensities to enhance the discriminating power for predicting epitopes. We find that individually polarity is not the best predictor, but it collaborates with others to yield good prediction. Usual feature selection methods cannot provide such information. CONCLUSIONS/ SIGNIFICANCE: Our results confirm the effectiveness of active (group) feature selection by GFSMLP over the traditional passive approaches of evaluating various combinations of propensities. The GFSMLP-based feature selection can be extended to more than 500 remaining propensities to enhance our biological knowledge about epitopes and to obtain better prediction. A graphical-user-interface version of GFSMLP is available at: http://bio.classcloud.org/GFSMLP/.

Assuntos

Epitopos de Linfócito B/química , Redes Neurais de Computação , Aminoácidos/química , Gráficos por Computador , Internet , Métodos , Modelos Moleculares , Software , Interface Usuário-Computador

Incremental Mountain Clustering Method to find building blocks for constructing structures of proteins.

Lin, Ken-Li; Lin, Chin-Teng; Pal, Nikhil R.

IEEE Trans Nanobioscience ; 9(4): 278-88, 2010 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-21266313

RESUMO

In this paper we propose an algorithm named Incremental Structural Mountain Clustering Method (ISMCM) with a view to finding a library of building blocks for reconstruction of 3-D structures of proteins/peptides. The building blocks are short structural motifs that are identified based on an estimate of local "density" of 3-D fragments computed using a measure of structural similarity. The structural similarity is computed after the best-molecular-fit alignment of pairs of fragments. The algorithm is tested on two well known benchmark data sets. Following the protocols used by other researchers, for the first data set we reconstruct a set of 71 test peptides (up to first 60 residues) whereas for the second data set we reconstruct all 143 test peptides. The ISMCM algorithm is found to successfully reconstruct the test peptides in terms of both global-fit root-mean-square (RMS) error and local-fit RMS error. The low values of local-fit RMS errors suggest that these building blocks extracted by ISMCM are good quantizers, which can represent nearby fragments quite accurately. To further assess the quality of building blocks we use two alternative graphical ways. We also use Shannon's entropy to show the structural similarity of the clusters found by our algorithm. This is important as building blocks that represent clusters with structurally similar fragments will be very effective in reconstruction. The entropic analysis reveals a very interesting fact that the secondary structure of the central residue of the fragments in a cluster is most strongly conserved (minimum entropy) over the cluster, which might be an indicator that central residue of the structural motif plays a dominant role in local folding.

Assuntos

Algoritmos , Análise por Conglomerados , Modelos Moleculares , Domínios e Motivos de Interação entre Proteínas , Entropia , Peptídeos/química

Structural building blocks: construction of protein 3-D structures using a structural variant of mountain clustering method.

Lin, Ken-Li; Lin, Chin-Teng; Pal, Nikhil R; Ojha, Sudeepta.

IEEE Eng Med Biol Mag ; 28(4): 38-44, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19622423

Assuntos

Análise por Conglomerados , Modelos Químicos , Proteínas/química , Algoritmos , Conformação Proteica

Feature selection and combination criteria for improving accuracy in protein structure prediction.

Lin, Ken-Li; Lin, Chun-Yuan; Huang, Chuen-Der; Chang, Hsiu-Ming; Yang, Chiao-Yun; Lin, Chin-Teng; Tang, Chuan Yi; Hsu, D Frank.

IEEE Trans Nanobioscience ; 6(2): 186-96, 2007 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-17695755

RESUMO

The classification of protein structures is essential for their function determination in bioinformatics. At present, a reasonably high rate of prediction accuracy has been achieved in classifying proteins into four classes in the SCOP database according to their primary amino acid sequences. However, for further classification into fine-grained folding categories, especially when the number of possible folding patterns as those defined in the SCOP database is large, it is still quite a challenge. In our previous work, we have proposed a two-level classification strategy called hierarchical learning architecture (HLA) using neural networks and two indirect coding features to differentiate proteins according to their classes and folding patterns, which achieved an accuracy rate of 65.5%. In this paper, we use a combinatorial fusion technique to facilitate feature selection and combination for improving predictive accuracy in protein structure classification. When applying various criteria in combinatorial fusion to the protein fold prediction approach using neural networks with HLA and the radial basis function network (RBFN), the resulting classification has an overall prediction accuracy rate of 87% for four classes and 69.6% for 27 folding categories. These rates are significantly higher than the accuracy rate of 56.5% previously obtained by Ding and Dubchak. Our results demonstrate that data fusion is a viable method for feature selection and combination in the prediction and classification of protein structure.

Assuntos

Algoritmos , Modelos Químicos , Modelos Moleculares , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Proteínas/ultraestrutura , Análise de Sequência de Proteína/métodos , Inteligência Artificial , Simulação por Computador , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Proteínas/classificação , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

Protein metal binding residue prediction based on neural networks.

Lin, Chin-Teng; Lin, Ken-Li; Yang, Chih-Hsien; Chung, I-Fang; Huang, Chuen-Der; Yang, Yuh-Shyong.

Int J Neural Syst ; 15(1-2): 71-84, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-15912584

RESUMO

Over one-third of protein structures contain metal ions, which are the necessary elements in life systems. Traditionally, structural biologists were used to investigate properties of metalloproteins (proteins which bind with metal ions) by physical means and interpreting the function formation and reaction mechanism of enzyme by their structures and observations from experiments in vitro. Most of proteins have primary structures (amino acid sequence information) only; however, the 3-dimension structures are not always available. In this paper, a direct analysis method is proposed to predict the protein metal-binding amino acid residues from its sequence information only by neural networks with sliding window-based feature extraction and biological feature encoding techniques. In four major bulk elements (Calcium, Potassium, Magnesium, and Sodium), the metal-binding residues are identified by the proposed method with higher than 90% sensitivity and very good accuracy under 5-fold cross validation. With such promising results, it can be extended and used as a powerful methodology for metal-binding characterization from rapidly increasing protein sequences in the future.

Assuntos

Simulação por Computador , Metaloproteínas/química , Modelos Moleculares , Redes Neurais de Computação , Conformação Proteica , Sequência de Aminoácidos , Animais , Sítios de Ligação , Bases de Dados de Proteínas , Humanos , Dados de Sequência Molecular , Sensibilidade e Especificidade

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA