Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Am J Obstet Gynecol ; 185(3): 716-24, 2001 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-11568803

RESUMO

OBJECTIVE: We propose that elucidation of the pathophysiology of preterm labor can be achieved with genome-scale analyses of differential gene expression. STUDY DESIGN: CD-1 mice on day 14.5 of a 19- to 20-day gestation were assigned to one of 4 treatment groups modeling different clinical conditions (n = 5 per group): group A, infection with labor (intrauterine injection of 10(10) heat-killed Escherichia coli, which causes delivery within an average of 20 hours); group B, infection without labor (intrauterine injection of 10(7) heat-killed E coli, which leads to normal delivery at term); group C, labor without infection (ovariectomy, which causes delivery within an average of 27 hours); and group D, no infection and no labor (intrauterine injection of vehicle). Total pooled myometrial RNA was prepared 3.5 hours after surgery for groups A, B, and D and 5 hours after surgery for group C. The relative expression of 4963 genes was assayed in these pools by using DNA microarrays. Transcripts specifically involved in infection-induced labor were identified by subtracting from the list of differentially regulated genes in group A those with common expression in groups B and C. RESULTS: In group A 68 differentially expressed transcripts (>or=2-fold upregulation or downregulation) were identified. Among these are 39 characterized genes. Fourteen (45%) are involved in inflammatory responses, 7 (18%) are involved in growth-differentiation-oncogenesis, and 3 (8%) are involved in apoptosis. Subtraction identified 13 gene products most likely to be important for bacterially induced labor, as opposed to labor without infection or bacterial exposure without labor. CONCLUSION: This study demonstrates the potential of the subtractive DNA microarray technique to identify transcripts important specifically for bacterially induced preterm labor.


Assuntos
Expressão Gênica , Trabalho de Parto Prematuro/genética , Animais , Apoptose/fisiologia , Infecções Bacterianas/complicações , Feminino , Inflamação/complicações , Camundongos , Camundongos Endogâmicos , Neoplasias/complicações , Trabalho de Parto Prematuro/etiologia , Análise de Sequência com Séries de Oligonucleotídeos , Gravidez , Complicações na Gravidez/fisiopatologia , Complicações Infecciosas na Gravidez/fisiopatologia , Transcrição Gênica
2.
Bioinformatics ; 17 Suppl 1: S65-73, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11472994

RESUMO

Accurately estimating probabilities from observations is important for probabilistic-based approaches to problems in computational biology. In this paper we present a biologically-motivated method for estimating probability distributions over discrete alphabets from observations using a mixture model of common ancestors. The method is an extension of substitution matrix-based probability estimation methods. In contrast to previous such methods, our method has a simple Bayesian interpretation and has the advantage over Dirichlet mixtures that it is both effective and simple to compute for large alphabets. The method is applied to estimate amino acid probabilities based on observed counts in an alignment and is shown to perform comparably to previous methods. The method is also applied to estimate probability distributions over protein families and improves protein classification accuracy.


Assuntos
Probabilidade , Alinhamento de Sequência/estatística & dados numéricos , Sequência de Aminoácidos , Teorema de Bayes , Biologia Computacional , Evolução Molecular , Modelos Genéticos , Proteínas/genética
3.
Pac Symp Biocomput ; : 151-63, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11262936

RESUMO

In this paper we consider the problem of extracting information from the upstream untranslated regions of genes to make predictions about their transcriptional regulation. We present a method for classifying genes based on motif-based hidden Markov models (HMMs) of their promoter regions. Sequence motifs discovered in yeast promoters are used to construct HMMs that include parameters describing the number and relative locations of motifs within each sequence. Each model provides a Fisher kernel for a support vector machine, which can be used to predict the classifications of unannotated promoters. We demonstrate this method on two classes of genes from the budding yeast, S. cerevisiae. Our results suggest that the additional sequence features captured by the HMM assist in correctly classifying promoters.


Assuntos
Modelos Genéticos , Regiões Promotoras Genéticas , Algoritmos , Sequência de Bases , Sítios de Ligação/genética , DNA Fúngico/genética , DNA Fúngico/metabolismo , Genes Fúngicos , Cadeias de Markov , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Fatores de Transcrição/metabolismo
4.
Artigo em Inglês | MEDLINE | ID: mdl-10977074

RESUMO

In this paper we present a method for classifying proteins into families using sparse Markov transducers (SMTs). Sparse Markov transducers, similar to probabilistic suffix trees, estimate a probability distribution conditioned on an input sequence. SMTs generalize probabilistic suffix trees by allowing for wild-cards in the conditioning sequences. Because substitutions of amino acids are common in protein families, incorporating wildcards into the model significantly improves classification performance. We present two models for building protein family classifiers using SMTs. We also present efficient data structures to improve the memory usage of the models. We evaluate SMTs by building protein family classifiers using the Pfam database and compare our results to previously published results.


Assuntos
Algoritmos , Proteínas/classificação , Proteínas/genética , Análise de Sequência de Proteína/métodos , Animais , Bases de Dados Factuais , Humanos , Cadeias de Markov
5.
Proc Natl Acad Sci U S A ; 97(1): 262-7, 2000 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-10618406

RESUMO

We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.


Assuntos
DNA/análise , Expressão Gênica/genética , Genes Fúngicos/genética , Saccharomyces cerevisiae/genética , Algoritmos , Computadores , Bases de Dados Factuais , Proteínas Fúngicas/classificação , Proteínas Fúngicas/genética , Hibridização de Ácido Nucleico , Fases de Leitura Aberta/genética
6.
J Exp Zool ; 285(2): 128-39, 1999 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-10440724

RESUMO

Molecular sequences provide a rich source of data for inferring the phylogenetic relationships among species. However, recent work indicates that even an accurate multiple alignment of a large sequence set may yield an incorrect phylogeny and that the quality of the phylogenetic tree improves when the input consists only of the highly conserved, motif regions of the alignment. This work introduces two methods of producing multiple alignments that include only the conserved regions of the initial alignment. The first method retains conserved motifs, whereas the second retains individual conserved sites in the initial alignment. Using parsimony analysis on a mitochondrial data set containing 19 species among which the phylogenetic relationships are widely accepted, both conserved alignment methods produce better phylogenetic trees than the complete alignment. Unlike any of the 19 inference methods used before to analyze this data, both methods produce trees that are completely consistent with the known phylogeny. The motif-based method employs far fewer alignment sites for comparable error rates. For a larger data set containing mitochondrial sequences from 39 species, the site-based method produces a phylogenetic tree that is largely consistent with known phylogenetic relationships and suggests several novel placements. J. Exp. Zool. ( Mol. Dev. Evol.) 285:128-139, 1999.


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Sequência de Aminoácidos , Animais , Sequência Conservada , DNA Mitocondrial/genética , Humanos , Funções Verossimilhança , Proteínas/genética , Alinhamento de Sequência , Software , Vertebrados/classificação , Vertebrados/genética
7.
Bioinformatics ; 15(6): 463-70, 1999 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-10383471

RESUMO

MOTIVATION: Statistical models of protein families, such as position-specific scoring matrices, profiles and hidden Markov models, have been used effectively to find remote homologs when given a set of known protein family members. Unfortunately, training these models typically requires a relatively large set of training sequences. Recent work (Grundy, J. Comput. Biol., 5,<479-492, 1998) has shown that, when only a few family members are known, several theoretically justified statistical modeling techniques fail to provide homology detection performance on a par with Family Pairwise Search (FPS), an algorithm that combines scores from a pairwise sequence similarity algorithm such as BLAST. RESULTS: The present paper provides a model-based algorithm that improves FPS by incorporating hybrid motif-based models of the form generated by Cobbler (Henikoff and Henikoff, Protein Sci., 6, 698-705, 1997). For the 73 protein families investigated here, this cobbled FPS algorithm provides better homology detection performance than either Cobbler or FPS alone. This improvement is maintained when BLAST is replaced with the full Smith-Waterman algorithm. AVAILABILITY: http://fps.sdsc.edu


Assuntos
Algoritmos , Proteínas/química , Alinhamento de Sequência/métodos , Biologia Computacional , Modelos Estatísticos , Proteínas/genética , Alinhamento de Sequência/estatística & dados numéricos , Homologia de Sequência de Aminoácidos , Software
8.
Cell Mol Life Sci ; 55(3): 450-5, 1999 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-10228558

RESUMO

The proton-translocating NADH:ubiquinone oxidoreductase or complex I is located in the inner membranes of mitochondria, where it catalyzes the transfer of electrons from NADH to ubiquinone. Here we report that one of the subunits in complex I is homologous to short-chain dehydrogenases and reductases, a family of enzymes with diverse activities that include metabolizing steroids, prostaglandins and nucleotide sugars. We discovered that a subunit of complex I in human, cow, Neurospora crassa and Aquifex aeolius is homologous to nucleotide-sugar epimerases and hydroxysteroid dehydrogenases while seeking distant homologs of these enzymes with a hidden Markov model-based search of Genpept. This homology allows us to use information from the solved three-dimensional structures of nucleotide-sugar epimerases and hydroxysteroid dehydrogenases and our motif analysis of these enzymes to predict functional domains on their homologs in complex I.


Assuntos
NAD(P)H Desidrogenase (Quinona)/genética , Oxirredutases/genética , Sequência de Aminoácidos , Animais , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Bovinos , Evolução Molecular , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Bacilos e Cocos Aeróbios Gram-Negativos/enzimologia , Bacilos e Cocos Aeróbios Gram-Negativos/genética , Humanos , Hidroxiesteroide Desidrogenases/química , Hidroxiesteroide Desidrogenases/genética , Cadeias de Markov , Dados de Sequência Molecular , Família Multigênica , NAD(P)H Desidrogenase (Quinona)/química , Neurospora crassa/enzimologia , Neurospora crassa/genética , Oxirredutases/química , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Software
9.
J Comput Biol ; 5(3): 479-91, 1998.
Artigo em Inglês | MEDLINE | ID: mdl-9773344

RESUMO

The function of an unknown biological sequence can often be accurately inferred by identifying sequences homologous to the original sequence. Given a query set of known homologs, there exist at least three general classes of techniques for finding additional homologs: pairwise sequence comparisons, motif analysis, and hidden Markov modeling. Pairwise sequence comparisons are typically employed when only a single query sequence is known. Hidden Markov models (HMMs), on the other hand, are usually trained with sets of more than 100 sequences. Motif-based methods fall in between these two extremes. The current work introduces a straightforward generalization of pairwise sequence comparison algorithms to the case when multiple query sequences are available. This algorithm, called Family Pairwise Search (FPS), combines pairwise sequence comparison scores from each query sequence. A BLAST implementation of FPS is compared to representative examples of hidden Markov modeling (HMMER) and motif modeling (MEME). The three techniques are compared across a wide range of protein families, using query sets of varying sizes. BLAST FPS significantly outperforms motif-based and HMM methods. Furthermore, FPS is much more efficient than the training algorithms for statistical models.


Assuntos
Algoritmos , Proteínas/química , Homologia de Sequência de Aminoácidos , Estudos de Avaliação como Assunto
10.
Biochem Biophys Res Commun ; 248(2): 250-4, 1998 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-9675122

RESUMO

Spinach CSP41 is part of a protein complex that binds to the 3' untranslated region (UTR) of petD precursor-mRNA, a chloroplast gene encoding subunit IV of the cytochrome b6/f complex. CSP41 cleaves the 3'-UTR of petD mRNA within the stem-loop structure, suggesting a key role in the control of chloroplast mRNA stability. We discovered that CSP41 is homologous to nucleotide-sugar epimerases and hydroxysteroid dehydrogenases while seeking distant homologs of these enzymes with a hidden Markov model-based search of Genpept. This analysis identified Synechocystis ORF, Accession 1652543 as a homolog. Subsequent analyses show that spinach CSP41 and Arabidopsis thaliana 2765081 are homologous to the Synechocystis ORF. Information from the solved 3D structures of epimerases and dehydrogenases and our motif analysis of these enzymes is used to predict domains on CSP41 that are important in binding and metabolism of mRNA. Cyanobacteria are among the earliest life forms, indicating that the divergence from a common ancestor of nucleotide-sugar epimerases and an mRNA binding protein with ribonuclease activity was ancient.


Assuntos
Complexo Citocromos b6f , Endorribonucleases/química , Proteínas de Ligação a RNA/química , Spinacia oleracea/enzimologia , Sequência de Aminoácidos , Arabidopsis/enzimologia , Proteínas de Bactérias/química , Cianobactérias/enzimologia , Grupo dos Citocromos b/genética , Bases de Dados como Assunto , Hidroxiesteroide Desidrogenases/química , Dados de Sequência Molecular , Proteínas de Plantas/química , Estrutura Secundária de Proteína , Precursores de RNA/metabolismo , RNA Mensageiro/metabolismo , Racemases e Epimerases/química , Homologia de Sequência de Aminoácidos , Software
11.
Comput Appl Biosci ; 13(4): 397-406, 1997 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-9283754

RESUMO

MOTIVATION: Modeling families of related biological sequences using Hidden Markov models (HMMs), although increasingly widespread, faces at least one major problem: because of the complexity of these mathematical models, they require a relatively large training set in order to accurately recognize a given family. For families in which there are few known sequences, a standard linear HMM contains too many parameters to be trained adequately. RESULTS: This work attempts to solve that problem by generating smaller HMMs which precisely model only the conserved regions of the family. These HMMs are constructed from motif models generated by the EM algorithm using the MEME software. Because motif-based HMMs have relatively few parameters, they can be trained using smaller data sets. Studies of short chain alcohol dehydrogenases and 4Fe-4S ferredoxins support the claim that motif-based HMMs exhibit increased sensitivity and selectivity in database searches, especially when training sets contain few sequences.


Assuntos
Cadeias de Markov , Proteínas/genética , Software , Álcool Desidrogenase/genética , Algoritmos , Sequência de Aminoácidos , Bases de Dados Factuais , Ferredoxinas/genética , Dados de Sequência Molecular , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Homologia de Sequência de Aminoácidos , Processos Estocásticos
12.
Biochem Biophys Res Commun ; 231(3): 760-6, 1997 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-9070888

RESUMO

The increasing size of protein sequence databases is straining methods of sequence analysis, even as the increased information offers opportunities for sophisticated analyses of protein structure, function, and evolution. Here we describe a method that uses artificial intelligence-based algorithms to build models of families of protein sequences. These models can be used to search protein sequence databases for remote homologs. The MEME (Multiple Expectation-maximization for Motif Elicitation) software package identifies motif patterns in a protein family, and these motifs are combined into a hidden Markvov model (HMM) for use as a database searching tool. Meta-MEME is sensitive and accurate, as well as automated and unbiased, making it suitable for the analysis of large datasets. We demonstrate Meta-MEME on a family of dehydrogenases that includes mammalian 11 beta-hydroxysteroid and 17 beta-hydroxysteroid dehydrogenase and their homologs in the short chain alcohol dehydrogenase family. We chose this dataset because it is large and phylogenetically diverse, providing a good test of the sensitivity and selectivity of Meta-MEME on a protein family of biological interest. Indeed, Meta-MEME identifies at least 350 members of this family in Genpept96 and clearly separates these sequences from non-homologous proteins. We also show how the MEME motif output can be used for phylogenetic analysis.


Assuntos
Oxirredutases/química , Análise de Sequência/métodos , Algoritmos , Sequência de Aminoácidos , Animais , Inteligência Artificial , Proteínas de Bactérias/química , Sequência Consenso , Entropia , Humanos , Cadeias de Markov , Dados de Sequência Molecular , Filogenia , Homologia de Sequência de Aminoácidos
13.
Comput Appl Biosci ; 12(4): 303-10, 1996 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-8902357

RESUMO

Many advanced software tools fail to reach a wide audience because they require specialized hardware, installation expertise, or an abundance of CPU cycles. The worldwide web offers a new opportunity for distributing such systems. One such program, MEME, discovers repeated patterns, called motifs, in sets of DNA or protein sequences. This tool is now available to biologists over the worldwide web, using an asynchronous, single-program multiple-data version of the program called ParaMEME that runs on an Intel Paragon XP/S parallel computer at the San Diego Super-computer Center. ParaMEME scales gracefully to 64 nodes on the Paragon with efficiencies > 72% for large data sets. The worldwide web interface to ParaMEME accepts a set of sequences interactively from a user, submits the sequences to the Paragon for analysis, and e-mails the results back to the user. ParaMEME is available for free public use at http://@www.sdsc.edu/CompSci/Biomed/ MEME.


Assuntos
Redes de Comunicação de Computadores , DNA/genética , Proteínas/genética , Software , Algoritmos , Bases de Dados Factuais , Estudos de Avaliação como Assunto , Sequências Repetitivas de Ácido Nucleico , Design de Software , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...