Pesquisa | Portal Regional da BVS (teste)

Shape string: a new feature for prediction of DNA-binding residues.

Wang, Duo-Duo; Li, Tong-Hua; Sun, Jiang-Ming; Li, Da-Peng; Xiong, Wen-Wei; Wang, Wen-Yan; Tang, Sheng-Nan.

Biochimie ; 95(2): 354-8, 2013 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-23116714

RESUMO

Protein-DNA interactions are involved in many biological processes essential for gene expression and regulation. To understand the molecular mechanisms of protein-DNA recognition, it is crucial to analyze and identify DNA-binding residues of protein-DNA complexes. Here, we proposed a novel descriptor shape string and another two related features shape string PSSM and shape string pair composition to characterize DNA-binding residues. We employed the new features and the position-specific scoring matrix (PSSM) for modeling and prediction. The results of a benchmark dataset showed that our approach significantly improved the accuracy of the predictor. The overall accuracy of our approach reached 85.86% with 85.02% sensitivity and 86.02% specificity. The results also demonstrated that shape string is a powerful descriptor for the prediction of DNA-binding residues. The additional two related features enhanced the predictive value.

Assuntos

Algoritmos , DNA/química , Matrizes de Pontuação de Posição Específica , Proteínas/química , Software , Sequência de Aminoácidos , Sítios de Ligação , Bases de Dados de Proteínas , Modelos Moleculares , Dados de Sequência Molecular , Ligação Proteica , Domínios e Motivos de Interação entre Proteínas , Sensibilidade e Especificidade

Retrieving backbone string neighbors provides insights into structural modeling of membrane proteins.

Sun, Jiang-Ming; Li, Tong-Hua; Cong, Pei-Sheng; Tang, Sheng-Nan; Xiong, Wen-Wei.

Mol Cell Proteomics ; 11(7): M111.016808, 2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-22415040

RESUMO

Identification of protein structural neighbors to a query is fundamental in structure and function prediction. Here we present BS-align, a systematic method to retrieve backbone string neighbors from primary sequences as templates for protein modeling. The backbone conformation of a protein is represented by the backbone string, as defined in Ramachandran space. The backbone string of a query can be accurately predicted by two innovative technologies: a knowledge-driven sequence alignment and encoding of a backbone string element profile. Then, the predicted backbone string is employed to align against a backbone string database and retrieve a set of backbone string neighbors. The backbone string neighbors were shown to be close to native structures of query proteins. BS-align was successfully employed to predict models of 10 membrane proteins with lengths ranging between 229 and 595 residues, and whose high-resolution structural determinations were difficult to elucidate both by experiment and prediction. The obtained TM-scores and root mean square deviations of the models confirmed that the models based on the backbone string neighbors retrieved by the BS-align were very close to the native membrane structures although the query and the neighbor shared a very low sequence identity. The backbone string system represents a new road for the prediction of protein structure from sequence, and suggests that the similarity of the backbone string would be more informative than describing a protein as belonging to a fold.

Assuntos

Algoritmos , Biologia Computacional/métodos , Proteínas de Membrana/química , Sequência de Aminoácidos , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Conformação Proteica , Proteus mirabilis , Alinhamento de Sequência , Análise de Sequência de Proteína , Homologia de Sequência de Aminoácidos , Homologia Estrutural de Proteína

Identification of the subcellular localization of mycobacterial proteins using localization motifs.

Tang, Sheng-Nan; Sun, Jiang-Ming; Xiong, Wen-Wei; Cong, Pei-Sheng; Li, Tong-Hua.

Biochimie ; 94(3): 847-53, 2012 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-22182488

RESUMO

Mycobacterium, the most common disease-causing genus, infects billions of people and is notoriously difficult to treat. Understanding the subcellular localization of mycobacterial proteins can provide essential clues for protein function and drug discovery. In this article, we present a novel approach that focuses on local sequence information to identify localization motifs that are generated by a merging algorithm and are selected based on a binomially distributed model. These localization motifs are employed as features for identifying the subcellular localization of mycobacterial proteins. Our approach provides more accurate results than previous methods and was tested on an independent dataset recently obtained from an experimental study to provide a first and reasonably accurate prediction of subcellular localization. Our approach can also be used for large-scale prediction of new protein entries in the UniportKB database and of protein sequences obtained experimentally. In addition, our approach identified many local motifs involved with the subcellular localization that also interact with the environment. Thus, our method may have widespread applications both in the study of the functions of mycobacterial proteins and in the search for a potential vaccine target for designing drugs.

Assuntos

Proteínas de Bactérias/metabolismo , Biologia Computacional/métodos , Mycobacterium/metabolismo , Algoritmos

Ovarian cancer classification based on dimensionality reduction for SELDI-TOF data.

Tang, Kai-Lin; Li, Tong-Hua; Xiong, Wen-Wei; Chen, Kai.

BMC Bioinformatics ; 11: 109, 2010 Feb 27.

Artigo em Inglês | MEDLINE | ID: mdl-20187963

RESUMO

BACKGROUND: Recent advances in proteomics technologies such as SELDI-TOF mass spectrometry has shown promise in the detection of early stage cancers. However, dimensionality reduction and classification are considerable challenges in statistical machine learning. We therefore propose a novel approach for dimensionality reduction and tested it using published high-resolution SELDI-TOF data for ovarian cancer. RESULTS: We propose a method based on statistical moments to reduce feature dimensions. After refining and t-testing, SELDI-TOF data are divided into several intervals. Four statistical moments (mean, variance, skewness and kurtosis) are calculated for each interval and are used as representative variables. The high dimensionality of the data can thus be rapidly reduced. To improve efficiency and classification performance, the data are further used in kernel PLS models. The method achieved average sensitivity of 0.9950, specificity of 0.9916, accuracy of 0.9935 and a correlation coefficient of 0.9869 for 100 five-fold cross validations. Furthermore, only one control was misclassified in leave-one-out cross validation. CONCLUSION: The proposed method is suitable for analyzing high-throughput proteomics data.

Assuntos

Neoplasias Ovarianas/classificação , Proteômica/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Biomarcadores Tumorais/análise , Feminino , Perfilação da Expressão Gênica , Humanos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA