Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 27(24): 3356-63, 2011 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-21998158

RESUMO

MOTIVATION: Pairwise protein sequence alignments are generally evaluated using scores defined as the sum of substitution scores for aligning amino acids to one another, and gap scores for aligning runs of amino acids in one sequence to null characters inserted into the other. Protein profiles may be abstracted from multiple alignments of protein sequences, and substitution and gap scores have been generalized to the alignment of such profiles either to single sequences or to other profiles. Although there is widespread agreement on the general form substitution scores should take for profile-sequence alignment, little consensus has been reached on how best to construct profile-profile substitution scores, and a large number of these scoring systems have been proposed. Here, we assess a variety of such substitution scores. For this evaluation, given a gold standard set of multiple alignments, we calculate the probability that a profile column yields a higher substitution score when aligned to a related than to an unrelated column. We also generalize this measure to sets of two or three adjacent columns. This simple approach has the advantages that it does not depend primarily upon the gold-standard alignment columns with the weakest empirical support, and that it does not need to fit gap and offset costs for use with each substitution score studied. RESULTS: A simple symmetrization of mean profile-sequence scores usually performed the best. These were followed closely by several specific scoring systems constructed using a variety of rationales. CONTACT: altschul@ncbi.nlm.nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Biologia Computacional , Eucariotos/química , Probabilidade
2.
J Comput Biol ; 18(8): 941-54, 2011 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21702690

RESUMO

Dirichlet mixtures provide an elegant formalism for constructing and evaluating protein multiple sequence alignments. Their use requires the inference of Dirichlet mixture priors from curated sets of accurately aligned sequences. This article addresses two questions relevant to such inference: of how many components should a Dirichlet mixture consist, and how may a maximum-likelihood mixture be derived from a given data set. To apply the Minimum Description Length principle to the first question, we extend an analytic formula for the complexity of a Dirichlet model to Dirichlet mixtures by informal argument. We apply a Gibbs-sampling based approach to the second question. Using artificial data generated by a Dirichlet mixture, we demonstrate that our methods are able to approximate well the true theory, when it exists. We apply our methods as well to real data, and infer Dirichlet mixtures that describe the data better than does a mixture derived using previous approaches.


Assuntos
Biologia Computacional , Proteínas/análise , Alinhamento de Sequência , Algoritmos , Motivos de Aminoácidos , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Modelos Estatísticos , Probabilidade , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos
3.
J Comput Biol ; 17(12): 1607-20, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21128852

RESUMO

Dirichlet mixture priors provide a Bayesian formalism for scoring alignments of protein profiles to individual sequences, which can be generalized to constructing scores for multiple-alignment columns. A Dirichlet mixture is a probability distribution over multinomial space, each of whose components can be thought of as modeling a type of protein position. Applied to the simplest case of pairwise sequence alignment, a Dirichlet mixture is equivalent to an implied symmetric substitution matrix. For alphabets of even size L, Dirichlet mixtures with L/2 components and symmetric substitution matrices have an identical number of free parameters. Although this suggests the possibility of a one-to-one mapping between the two formalisms, we show that there are some symmetric matrices no Dirichlet mixture can imply, and others implied by many distinct Dirichlet mixtures. Dirichlet mixtures are derived empirically from curated sets of multiple alignments. They imply "background" amino acid frequencies characteristic of these sets, and should thus be non-optimal for comparing proteins with non-standard composition. Given a mixture Θ, we seek an adjusted Θ' that implies the desired composition, but that minimizes an appropriate relative-entropy-based distance function. To render the problem tractable, we fix the mixture parameter as well as the sum of the Dirichlet parameters for each component, allowing only its center of mass to vary. This linearizes the constraints on the remaining parameters. An approach to finding Θ' may be based on small consecutive parameter adjustments. The relative entropy of two Dirichlet distributions separated by a small change in their parameter values implies a quadratic cost function for such changes. For a small change in implied background frequencies, this function can be minimized using the Lagrange-Newton method. We have implemented this method, and can compositionally adjust to good precision a 20-component Dirichlet mixture prior for proteins in under half a second on a standard workstation.


Assuntos
Algoritmos , Proteínas/química , Alinhamento de Sequência/métodos , Teorema de Bayes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...