Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Comput Biol ; 7(3-4): 585-600, 2000.
Artigo em Inglês | MEDLINE | ID: mdl-11108480

RESUMO

We present an efficient algorithm to systematically and automatically identify patterns in protein sequence families. The procedure is based on the Splash deterministic pattern discovery algorithm and on a framework to assess the statistical significance of patterns. We demonstrate its application to the fully automated discovery of patterns in 974 PROSITE families (the complete subset of PROSITE families which are defined by patterns and contain DR records). Splash generates patterns with better specificity and undiminished sensitivity, or vice versa, in 28% of the families; identical statistics were obtained in 48% of the families, worse statistics in 15%, and mixed behavior in the remaining 9%. In about 75% of the cases, Splash patterns identify sequence sites that overlap more than 50% with the corresponding PROSITE pattern. The procedure is sufficiently rapid to enable its use for daily curation of existing motif and profile databases. Third, our results show that the statistical significance of discovered patterns correlates well with their biological significance. The trypsin subfamily of serine proteases is used to illustrate this method's ability to exhaustively discover all motifs in a family that are statistically and biologically significant. Finally, we discuss applications of sequence patterns to multiple sequence alignment and the training of more sensitive score-based motif models, akin to the procedure used by PSI-BLAST. All results are available at httpl//www.research.ibm.com/spat/.


Assuntos
Algoritmos , Proteínas/química , Análise de Sequência de Proteína/estatística & dados numéricos , Sequência de Aminoácidos , Animais , Biologia Computacional , Bases de Dados Factuais , Cadeias de Markov , Modelos Moleculares , Reconhecimento Automatizado de Padrão , Conformação Proteica , Sensibilidade e Especificidade , Serina Endopeptidases/química , Tripsina/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...