Pesquisa | Portal Regional da BVS

1.

Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.

Hertz, G Z; Stormo, G D.

Bioinformatics ; 15(7-8): 563-77, 1999.

Artigo em Inglês | MEDLINE | ID: mdl-10487864

RESUMO

MOTIVATION: Molecular biologists frequently can obtain interesting insight by aligning a set of related DNA, RNA or protein sequences. Such alignments can be used to determine either evolutionary or functional relationships. Our interest is in identifying functional relationships. Unless the sequences are very similar, it is necessary to have a specific strategy for measuring-or scoring-the relatedness of the aligned sequences. If the alignment is not known, one can be determined by finding an alignment that optimizes the scoring scheme. RESULTS: We describe four components to our approach for determining alignments of multiple sequences. First, we review a log-likelihood scoring scheme we call information content. Second, we describe two methods for estimating the P value of an individual information content score: (i) a method that combines a technique from large-deviation statistics with numerical calculations; (ii) a method that is exclusively numerical. Third, we describe how we count the number of possible alignments given the overall amount of sequence data. This count is multiplied by the P value to determine the expected frequency of an information content score and, thus, the statistical significance of the corresponding alignment. Statistical significance can be used to compare alignments having differing widths and containing differing numbers of sequences. Fourth, we describe a greedy algorithm for determining alignments of functionally related sequences. Finally, we test the accuracy of our P value calculations, and give an example of using our algorithm to identify binding sites for the Escherichia coli CRP protein. AVAILABILITY: Programs were developed under the UNIX operating system and are available by anonymous ftp from ftp://beagle.colorado.edu/pub/consensus.

Assuntos

DNA/genética , Proteínas/genética , Alinhamento de Sequência/métodos , Algoritmos , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Sequência de Bases , Sítios de Ligação/genética , Proteínas de Transporte , Proteína Receptora de AMP Cíclico/genética , Proteína Receptora de AMP Cíclico/metabolismo , DNA Bacteriano/genética , Escherichia coli/genética , Escherichia coli/metabolismo , Modelos Lineares , Alinhamento de Sequência/estatística & dados numéricos , Software

2.

Bioinformatics in Siberia. First International Conference on Bioinformatics of Genome Regulation and Structure, Novosibirsk, Siberia, Russia, 24-27 August 1998.

Thieffry, D; Hertz, G Z.

Trends Genet ; 15(1): 8-9, 1999 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-10087925

Assuntos

Biologia Computacional , Regulação da Expressão Gênica , Genoma , Academias e Institutos , Animais , Humanos , Internet , Sibéria

3.

PromFD 1.0: a computer program that predicts eukaryotic pol II promoters using strings and IMD matrices.

Chen, Q K; Hertz, G Z; Stormo, G D.

Comput Appl Biosci ; 13(1): 29-35, 1997 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-9088706

RESUMO

MOTIVATION: A large number of new DNA sequences with virtually unknown functions are generated as the Human Genome Project progresses. Therefore, it is essential to develop computer algorithms that can predict the functionality of DNA segments according to their primary sequences, including algorithms that can predict promoters. Although several promoter-predicting algorithms are available, they have high false-positive detections and the rate of promoter detection needs to be improved further. RESULTS: In this research, PromFD, a computer program to recognize vertebrate RNA polymerase II promoters, has been developed. Both vertebrate promoters and non-promoter sequences are used in the analysis. The promoters are obtained from the Eukaryotic Promoter Database. Promoters are divided into a training set and a test set. Non-promoter sequences are obtained from the GenBank sequence databank, and are also divided into a training set and a test set. The first step is to search out, among all possible permutations, patterns of strings 5-10 bp long, that are significantly over-represented in the promoter set. The program also searches IMD (Information Matrix Database) matrices that have a significantly higher presence in the promoter set. The results of the searches are stored in the PromFD database, and the program PromFD scores input DNA sequences according to their content of the database entries. PromFD predicts promoters-their locations and the location of potential TATA boxes, if found. The program can detect 71% of promoters in the training set with a false-positive rate of under 1 in every 13,000 bp, and 47% of promoters in the test set with a false-positive rate of under 1 in every 9800 bp. PromFD uses a new approach and its false-positive identification rate is better compared with other available promoter recognition algorithms. The source code for PromFD is in the 'c+2' language.

Assuntos

Regiões Promotoras Genéticas , RNA Polimerase II/genética , Software , Algoritmos , Animais , Sequência de Bases , Bases de Dados Factuais , Estudos de Avaliação como Assunto , Projeto Genoma Humano , Humanos , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Design de Software , Vertebrados

4.

Escherichia coli promoter sequences: analysis and prediction.

Hertz, G Z; Stormo, G D.

Methods Enzymol ; 273: 30-42, 1996.

Artigo em Inglês | MEDLINE | ID: mdl-8791597

Assuntos

DNA Bacteriano/genética , Escherichia coli/genética , Genes Bacterianos , Regiões Promotoras Genéticas , Sequência de Bases , Sequência Consenso , Alinhamento de Sequência , Homologia de Sequência do Ácido Nucleico

5.

MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices.

Chen, Q K; Hertz, G Z; Stormo, G D.

Comput Appl Biosci ; 11(5): 563-6, 1995 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-8590181

RESUMO

The information matrix database (IMD), a database of weight matrices of transcription factor binding sites, is developed. MATRIX SEARCH, a program which can find potential transcription factor binding sites in DNA sequences using the IMD database, is also developed and accompanies the IMD database. MATRIX SEARCH adopts a user interface very similar to that of the SIGNAL SCAN program. MATRIX SEARCH allows the user to search an input sequence with the IMD automatically, to visualize the matrix representations of sites for particular factors, and to retrieve journal citations. The source code for MATRIX SEARCH is in the 'C' language, and the program is available for unix platforms.

Assuntos

Sequência de Bases , DNA/genética , Bases de Dados Factuais , Software , Sítios de Ligação/genética , DNA/metabolismo , Genes Reguladores , Dados de Sequência Molecular , Alinhamento de Sequência , Fatores de Transcrição/metabolismo

6.

Detection of deletions in the mitochondrial genome of Caenorhabditis elegans.

Melov, S; Hertz, G Z; Stormo, G D; Johnson, T E.

Nucleic Acids Res ; 22(6): 1075-8, 1994 Mar 25.

Artigo em Inglês | MEDLINE | ID: mdl-8152911

RESUMO

We have examined an aging population of Caenorhabditis elegans via a PCR assay to determine if deletions in the mitochondrial genome occur in the nematode. We detected eight such deletions, identified the breakpoints of four of these, and discovered direct repeats of 4-8 base pairs at the site of all four deletions. Six of the eight repeats involved in the deletions are located in or immediately adjacent to tRNAs. Without a biochemical bias, the probability of direct repeats being present at all four breakpoints was 4 x 10(-6).

Assuntos

Caenorhabditis elegans/genética , DNA Mitocondrial/química , Deleção de Genes , Envelhecimento/genética , Animais , Primers do DNA , Conformação de Ácido Nucleico , Reação em Cadeia da Polimerase , RNA de Transferência/química , RNA de Transferência/genética , Sequências Repetitivas de Ácido Nucleico

7.

DNA sequences at immunoglobulin switch region recombination sites.

Dunnick, W; Hertz, G Z; Scappino, L; Gritzmacher, C.

Nucleic Acids Res ; 21(3): 365-72, 1993 Feb 11.

Artigo em Inglês | MEDLINE | ID: mdl-8441648

RESUMO

The immunoglobulin heavy chain switch from synthesis of IgM to IgG, IgA or IgE is mediated by a DNA recombination event. Recombination occurs within switch regions, 2-10 kb segments of DNA that lie upstream of heavy chain constant region genes. A compilation of DNA sequences at more than 150 recombination sites within heavy chain switch regions is presented. Switch recombination does not appear to occur by homologous recombination. An extensive search for a recognition motif failed to find such a sequence, implying that switch recombination is not a site-specific event. A model for switch recombination that involves illegitimate priming of one switch region on another, followed by error-prone DNA synthesis, is proposed.

Assuntos

DNA , Região de Troca de Imunoglobulinas/genética , Recombinação Genética , Animais , Sequência de Bases , Humanos , Camundongos , Dados de Sequência Molecular

8.

Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods.

Gutell, R R; Power, A; Hertz, G Z; Putz, E J; Stormo, G D.

Nucleic Acids Res ; 20(21): 5785-95, 1992 Nov 11.

Artigo em Inglês | MEDLINE | ID: mdl-1454539

RESUMO

Comparative sequence analysis addresses the problem of RNA folding and RNA structural diversity, and is responsible for determining the folding of many RNA molecules, including 5S, 16S, and 23S rRNAs, tRNA, RNAse P RNA, and Group I and II introns. Initially this method was utilized to fold these sequences into their secondary structures. More recently, this method has revealed numerous tertiary correlations, elucidating novel RNA structural motifs, several of which have been experimentally tested and verified, substantiating the general application of this approach. As successful as the comparative methods have been in elucidating higher-order structure, it is clear that additional structure constraints remain to be found. Deciphering such constraints requires more sensitive and rigorous protocols, in addition to RNA sequence datasets that contain additional phylogenetic diversity and an overall increase in the number of sequences. Various RNA databases, including the tRNA and rRNA sequence datasets, continue to grow in number as well as diversity. Described herein is the development of more rigorous comparative analysis protocols. Our initial development and applications on different RNA datasets have been very encouraging. Such analyses on tRNA, 16S and 23S rRNA are substantiating previously proposed associations and are now beginning to reveal additional constraints on these molecules. A subset of these involve several positions that correlate simultaneously with one another, implying units larger than a basepair can be under a phylogenetic constraint.

Assuntos

Conformação de Ácido Nucleico , RNA Ribossômico/química , RNA de Transferência/química , Análise de Sequência de RNA/métodos , Sequência de Bases , Bases de Dados Factuais , Dados de Sequência Molecular , Alinhamento de Sequência

9.

Identification of consensus patterns in unaligned DNA sequences known to be functionally related.

Hertz, G Z; Hartzell, G W; Stormo, G D.

Comput Appl Biosci ; 6(2): 81-92, 1990 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-2193692

RESUMO

We have developed a method for identifying consensus patterns in a set of unaligned DNA sequences known to bind a common protein or to have some other common biochemical function. The method is based on a matrix representation of binding site patterns. Each row of the matrix represents one of the four possible bases, each column represents one of the positions of the binding site and each element is determined by the frequency the indicated base occurs at the indicated position. The goal of the method is to find the most significant matrix--i.e. the one with the lowest probability of occurring by chance--out of all the matrices that can be formed from the set of related sequences. The reliability of the method improves with the number of sequences, while the time required increases only linearly with the number of sequences. To test this method, we analysed 11 DNA sequences containing promoters regulated by the Escherichia coli LexA protein. The matrices we found were consistent with the known consensus sequence, and could distinguish the generally accepted LexA binding sites from other DNA sequences.

Assuntos

Sequência de Bases , DNA , Reconhecimento Automatizado de Padrão , Serina Endopeptidases , Software , Algoritmos , Proteínas de Bactérias/genética , Sítios de Ligação , DNA Bacteriano/genética , Escherichia coli/genética , Genes Bacterianos , Dados de Sequência Molecular

10.

The enhancer elements and GGGCGG boxes of SV40 provide similar functions in bidirectionally promoting transcription.

Hertz, G Z; Mertz, J E.

Virology ; 163(2): 579-90, 1988 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-2833024

RESUMO

The early and the late genes of simian virus 40 (SV40) are transcribed in opposite directions from a shared promoter region. The 72- and the 21-bp repeat regions of the SV40 genome contain the transcriptional enhancer and six copies of the Sp 1-binding GGGCGG box, respectively. SV40 mutants lacking various parts of these regions were examined in COS cells to determine the importance of these sequences for transcription in each direction. We made the following observations. (i) The 72-bp repeat region was required for efficient transcription of both the early and the late genes. (ii) The 21-bp repeat region was required for efficient early-gene transcription, but not for efficient late-gene transcription; however, it was able to supply some late-promoter activity when the 72-bp repeat region was missing. (iii) The ability of either of these regions to promote transcription was gradually reduced as the number of promoter elements within each was decreased. (iv) Mutations in these regions always decreased early-gene transcription more than late-gene transcription. These results indicate that both regions are made up of multiple bidirectional promoter elements, but that the 72-bp repeat region is more effective at inducing transcription than the 21-bp repeat region. Since each region can also (i) satisfy a need for promoter elements in the replication of viral DNA and (ii) induce a region of open chromatin, we conclude that the promoter elements within the enhancer and the GGGCGG boxes probably provide similar functions.

Assuntos

Elementos Facilitadores Genéticos , Genes Virais , Sequências Reguladoras de Ácido Nucleico , Vírus 40 dos Símios/genética , Transcrição Gênica , DNA Viral/genética , Regulação da Expressão Gênica , Regiões Promotoras Genéticas , RNA Mensageiro/biossíntese , RNA Viral/biossíntese , Sequências Repetitivas de Ácido Nucleico , Vírus 40 dos Símios/fisiologia , Replicação Viral

11.

The A+T-rich sequence of the simian virus 40 origin is essential for replication and is involved in bending of the viral DNA.

Hertz, G Z; Young, M R; Mertz, J E.

J Virol ; 61(7): 2322-5, 1987 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-3035231

RESUMO

The origin-promoter region of simian virus 40 contains a 17-base-pair sequence composed exclusively of adenine (A) and thymine (T). We constructed a linker replacement mutant in which this stretch of A's and T's was reduced to 11 base pairs. While not affecting the level of early gene transcription, this mutation reduced the accumulation of viral DNA in COS cells at least 10(4) fold. In addition, a restriction fragment containing the wild-type A + T-rich region migrated in nondenaturing polyacrylamide gels with an anomalous mobility characteristic of bent DNA; however, the corresponding fragment from the mutant migrated less anomalously. Therefore, bending of the DNA in this region may play a role in some step in viral DNA replication.

Assuntos

DNA Viral/genética , Vírus 40 dos Símios/genética , Composição de Bases , Sequência de Bases , Replicação do DNA , Genes Virais , Conformação de Ácido Nucleico , Regiões Promotoras Genéticas , Vírus 40 dos Símios/fisiologia , Replicação Viral

12.

Bidirectional promoter elements of simian virus 40 are required for efficient replication of the viral DNA.

Hertz, G Z; Mertz, J E.

Mol Cell Biol ; 6(10): 3513-22, 1986 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-3025597

RESUMO

Mutants of simian virus 40 (SV40) lacking parts of the 72- and 21-base-pair repeat regions were made deficient in large T antigen by recombination with dlA 4000, a mutant containing a frameshift deletion near the amino terminus of the T antigen genes. These double mutants were transfected into COS cells, and the amounts of replicated viral DNA were measured at various times thereafter. It was found that deletion of either the 72- or 21-base-pair repeat region did not significantly reduce the accumulation of viral DNA. However, cells transfected with mutants lacking both of these promoter elements accumulated 100-fold less viral DNA than cells transfected with wild-type SV40. This indicates that the 72- and 21-base-pair repeat regions are each sufficient for supplying a function required for efficient replication of SV40 DNA. In addition, the ability of either of these regions to support efficient replication was gradually reduced as the number of promoter elements within each was decreased. Since the 72- and 21-base-pair repeat regions bidirectionally induce transcription, our results indicate that bidirectional promoter elements play a role in the replication of viral DNA. However, fewer of these elements are required for efficient replication than for efficient transcription.

Assuntos

Replicação do DNA , Genes Virais , Regiões Promotoras Genéticas , Vírus 40 dos Símios/genética , Animais , Antígenos Transformantes de Poliomavirus , Antígenos Virais de Tumores/genética , Linhagem Celular , Enzimas de Restrição do DNA , DNA Recombinante/metabolismo , DNA Viral/genética , Genes , Mutação , Proteínas Oncogênicas Virais/genética , Transfecção

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA