Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
Add more filters










Publication year range
1.
Nucleic Acids Res ; 29(23): 4892-900, 2001 Dec 01.
Article in English | MEDLINE | ID: mdl-11726699

ABSTRACT

The RepA protein from bacteriophage P1 binds DNA to initiate replication. RepA covers one face of the DNA and the binding site has a completely conserved T that directly faces RepA from the minor groove at position +7. Although all four bases can be distinguished through contacts in the major groove of B-form DNA, contacts in the minor groove cannot easily distinguish between A and T bases. Therefore the 100% conservation at this position cannot be accounted for by direct contacts approaching into the minor groove of B-form DNA. RepA binding sites with modified base pairs at position +7 were used to investigate contacts with RepA. The data show that RepA contacts the N3 proton of T at position +7 and that the T=A hydrogen bonds are already broken in the DNA before RepA binds. To accommodate the N3 proton contact the T(+7 )/A(+7)((')) base pair must be distorted. One possibility is that T(+7) is flipped out of the helix. The energetics of the contact allows RepA to distinguish between all four bases, accounting for the observed high sequence conservation. After protein binding, base pair distortion or base flipping could initiate DNA melting as the second step in DNA replication.


Subject(s)
DNA Helicases , DNA Replication , DNA/chemistry , DNA/metabolism , Proteins/metabolism , Replication Origin , Thymine/metabolism , Trans-Activators , Base Pairing , Base Sequence , Binding Sites , Conserved Sequence , DNA-Binding Proteins/metabolism , Electrophoretic Mobility Shift Assay , Hydrogen Bonding , Models, Genetic , Nucleic Acid Conformation , Protein Binding , Protons
2.
Nucleic Acids Res ; 29(23): 4881-91, 2001 Dec 01.
Article in English | MEDLINE | ID: mdl-11726698

ABSTRACT

The sequence logo for DNA binding sites of the bacteriophage P1 replication protein RepA shows unusually high sequence conservation ( approximately 2 bits) at a minor groove that faces RepA. However, B-form DNA can support only 1 bit of sequence conservation via contacts into the minor groove. The high conservation in RepA sites therefore implies a distorted DNA helix with direct or indirect contacts to the protein. Here I show that a high minor groove conservation signature also appears in sequence logos of sites for other replication origin binding proteins (Rts1, DnaA, P4 alpha, EBNA1, ORC) and promoter binding proteins (sigma(70), sigma(D) factors). This finding implies that DNA binding proteins generally use non-B-form DNA distortion such as base flipping to initiate replication and transcription.


Subject(s)
DNA Replication , DNA-Binding Proteins/metabolism , DNA/chemistry , Replication Origin , Transcription Initiation Site , Transcription, Genetic , Viral Proteins , Bacterial Proteins/metabolism , Base Sequence , Binding Sites , Conserved Sequence , DNA/metabolism , DNA Helicases/metabolism , DNA-Directed RNA Polymerases/metabolism , Epstein-Barr Virus Nuclear Antigens/metabolism , Nucleic Acid Conformation , Origin Recognition Complex , Promoter Regions, Genetic , Protein Binding , RNA Nucleotidyltransferases/metabolism , Sigma Factor/metabolism
3.
J Mol Biol ; 313(1): 215-28, 2001 Oct 12.
Article in English | MEDLINE | ID: mdl-11601857

ABSTRACT

During translational initiation in prokaryotes, the 3' end of the 16S rRNA binds to a region just upstream of the initiation codon. The relationship between this Shine-Dalgarno (SD) region and the binding of ribosomes to translation start-points has been well studied, but a unified mathematical connection between the SD, the initiation codon and the spacing between them has been lacking. Using information theory, we constructed a model that treats these three components uniformly by assigning to the SD and the initiation region (IR) conservations in bits of information, and by assigning to the spacing an uncertainty, also in bits. To build the model, we first aligned the SD region by maximizing the information content there. The ease of this process confirmed the existence of the SD pattern within a set of 4122 reviewed and revised Escherichia coli gene starts. This large data set allowed us to show graphically, by sequence logos, that the spacing between the SD and the initiation region affects both the SD site conservation and its pattern. We used the aligned SD, the spacing, and the initiation region to model ribosome binding and to identify gene starts that do not conform to the ribosome binding site model. A total of 569 experimentally proven starts are more conserved (have higher information content) than the full set of revised starts, which probably reflects an experimental bias against the detection of gene products that have inefficient ribosome binding sites. Models were refined cyclically by removing non-conforming weak sites. After this procedure, models derived from either the original or the revised gene start annotation were similar. Therefore, this information theory-based technique provides a method for easily constructing biologically sensible ribosome binding site models. Such models should be useful for refining gene-start predictions of any sequenced bacterial genome.


Subject(s)
Escherichia coli Proteins/genetics , Escherichia coli Proteins/metabolism , Escherichia coli/genetics , Genes, Bacterial/genetics , Peptide Chain Initiation, Translational/genetics , Ribosomes/chemistry , Ribosomes/metabolism , Base Sequence , Binding Sites , Codon, Initiator/genetics , Databases as Topic , Escherichia coli Proteins/chemistry , Information Theory , Models, Biological , Nucleic Acid Conformation , Pliability , Protein Binding , RNA Stability , RNA, Bacterial/chemistry , RNA, Bacterial/genetics , RNA, Bacterial/metabolism , RNA, Messenger/chemistry , RNA, Messenger/genetics , RNA, Messenger/metabolism , RNA-Binding Proteins/chemistry , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Regulatory Sequences, Nucleic Acid/genetics , Ribosomes/genetics
4.
J Bacteriol ; 183(15): 4571-9, 2001 Aug.
Article in English | MEDLINE | ID: mdl-11443092

ABSTRACT

A computational search was carried out to identify additional targets for the Escherichia coli OxyR transcription factor. This approach predicted OxyR binding sites upstream of dsbG, encoding a periplasmic disulfide bond chaperone-isomerase; upstream of fhuF, encoding a protein required for iron uptake; and within yfdI. DNase I footprinting assays confirmed that oxidized OxyR bound to the predicted site centered 54 bp upstream of the dsbG gene and 238 bp upstream of a known OxyR binding site in the promoter region of the divergently transcribed ahpC gene. Although the new binding site was near dsbG, Northern blotting and primer extension assays showed that OxyR binding to the dsbG-proximal site led to the induction of a second ahpCF transcript, while OxyR binding to the ahpCF-proximal site leads to the induction of both dsbG and ahpC transcripts. Oxidized OxyR binding to the predicted site centered 40 bp upstream of the fhuF gene was confirmed by DNase I footprinting, but these assays further revealed a second higher-affinity site in the fhuF promoter. Interestingly, the two OxyR sites in the fhuF promoter overlapped with two regions bound by the Fur repressor. Expression analysis revealed that fhuF was repressed by hydrogen peroxide in an OxyR-dependent manner. Finally, DNase I footprinting experiments showed OxyR binding to the site predicted to be within the coding sequence of yfdI. These results demonstrate the versatile modes of regulation by OxyR and illustrate the need to learn more about the ensembles of binding sites and transcripts in the E. coli genome.


Subject(s)
Bacterial Proteins/metabolism , DNA, Bacterial/metabolism , DNA-Binding Proteins , Escherichia coli Proteins , Escherichia coli/genetics , Periplasmic Proteins , Repressor Proteins/metabolism , Transcription Factors/metabolism , Bacterial Outer Membrane Proteins , Bacterial Proteins/genetics , Base Sequence , Binding Sites , Escherichia coli/metabolism , Gene Expression Regulation, Bacterial , Gene Expression Regulation, Enzymologic , Iron-Binding Proteins , Molecular Sequence Data , Oxidoreductases/genetics , Periplasmic Binding Proteins , Peroxidases/genetics , Peroxiredoxins , Promoter Regions, Genetic , Repressor Proteins/genetics , Transcriptional Activation
5.
Nucleic Acids Res ; 29(7): 1443-52, 2001 Apr 01.
Article in English | MEDLINE | ID: mdl-11266544

ABSTRACT

Defects in the XPG DNA repair endonuclease gene can result in the cancer-prone disorders xeroderma pigmentosum (XP) or the XP-Cockayne syndrome complex. While the XPG cDNA sequence was known, determination of the genomic sequence was required to understand its different functions. In cells from normal donors, we found that the genomic sequence of the human XPG gene spans 30 kb, contains 15 exons that range from 61 to 1074 bp and 14 introns that range from 250 to 5763 bp. Analysis of the splice donor and acceptor sites using an information theory-based approach revealed three splice sites with low information content, which are components of the minor (U12) spliceosome. We identified six alternatively spliced XPG mRNA isoforms in cells from normal donors and from XPG patients: partial deletion of exon 8, partial retention of intron 8, two with alternative exons (in introns 1 and 6) and two that retained complete introns (introns 3 and 9). The amount of alternatively spliced XPG mRNA isoforms varied in different tissues. Most alternative splice donor and acceptor sites had a relatively high information content, but one has the U12 spliceosome sequence. A single nucleotide polymorphism has allele frequencies of 0.74 for 3507G and 0.26 for 3507C in 91 donors. The human XPG gene contains multiple splice sites with low information content in association with multiple alternatively spliced isoforms of XPG mRNA.


Subject(s)
DNA-Binding Proteins/genetics , Alternative Splicing , Base Sequence , Cell Line , DNA/chemistry , DNA/genetics , Endonucleases , Exons , Genes/genetics , Humans , Introns , Male , Molecular Sequence Data , Nuclear Proteins , Polymorphism, Single Nucleotide , RNA, Messenger/genetics , RNA, Messenger/metabolism , Reverse Transcriptase Polymerase Chain Reaction , Sequence Analysis, DNA , Tissue Distribution , Transcription Factors
7.
Acta Crystallogr D Biol Crystallogr ; 56(Pt 9): 1156-65, 2000 Sep.
Article in English | MEDLINE | ID: mdl-10957634

ABSTRACT

A protein sequence can be classified into one of four structural classes, namely alpha, beta, alpha + beta and alpha/beta, based on its amino-acid composition. The present study aims at understanding why a particular sequence with a given amino-acid composition should fold into a specific structural class. In order to answer this question, each amino acid in the protein sequence was classified to a particular neighbor density based on the number of spatial residues surrounding it within a distance of 6.5 A. Each of the four structural classes showed a unique preference of amino acids in each of the neighbor densities. Residues which show a high compositional bias in a structural class are also found to occur in high neighbor densities. This high compositional bias towards specific residues in the four different structural classes of proteins appears to be caused by structural and functional requirements. The distribution of amino acids in different neighbor densities is graphically presented in a novel logo form which incorporates several features such as composition, the frequency of occurrence and color code for amino acids. The spatial neighbors of the residues in different neighbor densities and their secondary structural location are also represented in the form of logos. This representation helped in the identification of specific details of the whole data which may otherwise have gone unnoticed. It is suggested that the data presented in this study may be useful in knowledge-based structure modelling and de novo protein design.


Subject(s)
Amino Acids/chemistry , Protein Structure, Secondary , Proteins/chemistry , Amino Acid Sequence , Binding Sites , Capsid/chemistry , Glycoproteins/chemistry , Models, Chemical , Models, Molecular
8.
Nucleic Acids Res ; 28(14): 2794-9, 2000 Jul 15.
Article in English | MEDLINE | ID: mdl-10908337

ABSTRACT

How do genetic systems gain information by evolutionary processes? Answering this question precisely requires a robust, quantitative measure of information. Fortunately, 50 years ago Claude Shannon defined information as a decrease in the uncertainty of a receiver. For molecular systems, uncertainty is closely related to entropy and hence has clear connections to the Second Law of Thermodynamics. These aspects of information theory have allowed the development of a straightforward and practical method of measuring information in genetic control systems. Here this method is used to observe information gain in the binding sites for an artificial 'protein' in a computer simulation of evolution. The simulation begins with zero information and, as in naturally occurring genetic systems, the information measured in the fully evolved binding sites is close to that needed to locate the sites in the genome. The transition is rapid, demonstrating that information gain can occur by punctuated equilibrium.


Subject(s)
Binding Sites/genetics , Evolution, Molecular , Information Theory , Base Sequence , Models, Biological , Molecular Sequence Data , Selection, Genetic , Software , Thermodynamics
9.
Mol Microbiol ; 34(3): 414-30, 1999 Nov.
Article in English | MEDLINE | ID: mdl-10564484

ABSTRACT

SoxS is the direct transcriptional activator of the member genes of the Escherichia coli superoxide regulon. At class I SoxS-dependent promoters, e.g. zwf and fpr, whose SoxS binding sites ('soxbox') lie upstream of the -35 region of the promoter, activation requires the C-terminal domain of the RNA polymerase alpha-subunit, while at class II SoxS-dependent promoters, e.g. fumC and micF, whose binding sites overlap the -35 region, activation is independent of the alpha-CTD. To determine whether SoxS activation of its class I promoters shows the same helical phase-dependent spacing requirement as class I promoters activated by catabolite gene activator protein, we increased the 7 bp distance between the 20 bp zwf soxbox and the zwf -35 promoter hexamer by 5 bp and 11 bp, and we decreased the 15 bp distance between the 20 bp fpr soxbox and the fpr -35 promoter hexamer by the same amounts. In both cases, displacement of the binding site by a half or full turn of the DNA helix prevented transcriptional activation. With constructs containing the binding site of one gene fused to the promoter of the other, we demonstrated that the positional requirements are a function of the specific binding site, not the promoter. Supposing that opposite orientation of the SoxS binding site at the two promoters might account for the positional requirements, we placed the zwf and fpr soxboxes in the reverse orientation at the various positions upstream of the promoters and determined the effect of orientation on transcription activation. We found that reversing the orientation of the zwf binding site converts its positional requirement to that of the fpr binding site in its normal orientation, and vice versa. Analysis by molecular information theory of DNA sequences known to bind SoxS in vitro is consistent with the opposite orientation of the zwf and fpr soxboxes.


Subject(s)
Bacterial Proteins/metabolism , Escherichia coli Proteins , Escherichia coli/genetics , Promoter Regions, Genetic , Superoxides/metabolism , Trans-Activators , Transcription Factors/metabolism , Transcriptional Activation , Bacterial Proteins/genetics , Base Sequence , Binding Sites , Escherichia coli/growth & development , Escherichia coli/metabolism , Molecular Sequence Data , Plasmids/genetics , Sequence Analysis, DNA , Transcription Factors/genetics , Transcription, Genetic
10.
J Theor Biol ; 201(1): 87-92, 1999 Nov 07.
Article in English | MEDLINE | ID: mdl-10534438
11.
J Bacteriol ; 181(15): 4639-43, 1999 Aug.
Article in English | MEDLINE | ID: mdl-10419964

ABSTRACT

The cytotoxic effects of reactive oxygen species are largely mediated by iron. Hydrogen peroxide reacts with iron to form the extremely reactive and damaging hydroxyl radical via the Fenton reaction. Superoxide anion accelerates this reaction because the dismutation of superoxide leads to increased levels of hydrogen peroxide and because superoxide elevates the intracellular concentration of iron by attacking iron-sulfur proteins. We found that regulators of the Escherichia coli responses to oxidative stress, OxyR and SoxRS, activate the expression of Fur, the global repressor of ferric ion uptake. A transcript encoding Fur was induced by hydrogen peroxide in a wild-type strain but not in a DeltaoxyR strain, and DNase I footprinting assays showed that OxyR binds to the fur promoter. In cells treated with the superoxide-generating compound paraquat, we observed the induction of a longer transcript encompassing both fur and its immediate upstream gene fldA, which encodes a flavodoxin. This polycistronic mRNA is induced by paraquat in a wild-type strain but not in a DeltasoxRS strain, and SoxS was shown to bind to the fldA promoter. These results demonstrate that iron metabolism is coordinately regulated with the oxidative stress defenses.


Subject(s)
Bacterial Proteins/genetics , Bacterial Proteins/metabolism , DNA-Binding Proteins , Escherichia coli Proteins , Escherichia coli/physiology , Flavoproteins , Gene Expression Regulation, Bacterial , Repressor Proteins/genetics , Repressor Proteins/metabolism , Trans-Activators , Transcription Factors/metabolism , Base Sequence , DNA Footprinting , Escherichia coli/genetics , Escherichia coli/metabolism , Hydrogen Peroxide/metabolism , Iron/metabolism , Metalloproteins/metabolism , Molecular Sequence Data , Oxidative Stress , Paraquat/pharmacology , Promoter Regions, Genetic , RNA, Messenger/genetics , Superoxides/metabolism , Transcription Factors/genetics , Transcription, Genetic/drug effects
12.
Nucleic Acids Res ; 27(3): 882-7, 1999 Feb 01.
Article in English | MEDLINE | ID: mdl-9889287

ABSTRACT

In vitro experiments that characterize DNA-protein interactions by artificial selection, such as SELEX,are often performed with the assumption that the experimental conditions are equivalent to natural ones. To test whether SELEX gives natural results, we compared sequence logos composed from naturally occurring leucine-responsive regulatory protein (Lrp) binding sites with those composed from SELEX-generated binding sites. The sequence logos were significantly different, indicating that the binding conditions are disparate. A likely explanation is that the SELEX experiment selected for a dimeric or trimeric Lrp complex bound to DNA. In contrast, natural sites appear to be bound by a monomer. This discrepancy suggests that in vitro selections do not necessarily give binding site sets comparable with the natural binding sites.


Subject(s)
DNA-Binding Proteins/metabolism , DNA/metabolism , Information Systems , Molecular Biology/methods , Selection, Genetic , Base Sequence , Binding Sites , DNA/chemistry , DNA Footprinting , DNA Probes , Dimerization , Leucine , Leucine-Responsive Regulatory Protein , Ligands , Models, Theoretical , Molecular Sequence Data , Sequence Alignment , Transcription Factors
13.
J Invest Dermatol ; 111(5): 791-6, 1998 Nov.
Article in English | MEDLINE | ID: mdl-9804340

ABSTRACT

A 4 y old boy of Korean ancestry had xeroderma pigmentosum (XP) with sun sensitivity, multiple cutaneous neoplasms, and inability to speak. Neurologic examination revealed hyperactivity and autistic features without typical XP neurologic abnormalities. Cultured skin fibroblasts (XP22BE) showed decreased post-UV survival, reduced post-UV plasmid host cell reactivation and defective DNA repair (16% of normal unscheduled DNA synthesis in intact cells and undetectable excision repair in a cell free extract). In vitro and in vivo complementation assigned XP22BE to XP group C (XPC) and a markedly reduced level of XPC mRNA was found. Two XPC cDNA bands were identified. One band had a deletion of 161 bases comprising the entire exon 9, which resulted in premature termination of the mutant XPC mRNA. The larger band also had the same deletion of exon 9 but, in addition, had an insertion of 155 bases in its place (exon 9a), resulting in an in-frame XPC mRNA. Genomic DNA analysis revealed a T-->G mutation at the splice donor site of XPC exon 9, which markedly reduced its information content. The 155 base pair XPC exon 9a insertion was located in intron 9 and was flanked by strong splice donor and acceptor sequences. Analysis of the patient's blood showed persistently low levels of glycine (68 microM; NL, 125-318 microM). Normal glycine levels were maintained with oral glycine supplements and his hyperactivity diminished. These data provide evidence of an association of an XPC splice site mutation with autistic neurologic features and hypoglycinemia.


Subject(s)
Autistic Disorder/complications , DNA-Binding Proteins/genetics , Glycine/blood , Xeroderma Pigmentosum/genetics , Alternative Splicing , Blotting, Northern , Child, Preschool , Chromosomes, Human, Pair 3 , DNA/genetics , DNA Repair , Fibroblasts/radiation effects , Genetic Markers/genetics , Humans , Male , Microsatellite Repeats/genetics , Mutation , Survival Rate , Transcription, Genetic , Ultraviolet Rays , Xeroderma Pigmentosum/complications
14.
Hum Mutat ; 12(3): 153-71, 1998.
Article in English | MEDLINE | ID: mdl-9711873

ABSTRACT

Splice site nucleotide substitutions can be analyzed by comparing the individual information contents (Ri, bits) of the normal and variant splice junction sequences [Rogan and Schneider, 1995]. In the present study, we related splicing abnormalities to changes in Ri values of 111 previously reported splice site substitutions in 41 different genes. Mutant donor and acceptor sites have significantly less information than their normal counterparts. With one possible exception, primary mutant sites with <2.4 bits were not spliced. Sites with Ri values > or = 2.4 bits but less than the corresponding natural site usually decreased, but did not abolish splicing. Substitutions that produced small changes in Ri probably do not impair splicing and are often polymorphisms. The Ri values of activated cryptic sites were generally comparable to or greater than those of the corresponding natural splice sites. Information analysis revealed preexisting cryptic splice junctions that are used instead of the mutated natural site. Other cryptic sites were created or strengthened by sequence changes that simultaneously altered the natural site. Comparison between normal and mutant splice site Ri values distinguishes substitutions that impair splicing from those which do not, distinguishes null alleles from those that are partially functional, and detects activated cryptic splice sites.


Subject(s)
Mutation , RNA Splicing , Base Sequence , Humans , RNA, Messenger
15.
J Bacteriol ; 180(15): 3940-5, 1998 Aug.
Article in English | MEDLINE | ID: mdl-9683492

ABSTRACT

The uncB gene codes for the a subunit of the Fo proton channel sector of the Escherichia coli F1 Fo ATPase. Control of expression of uncB appears to be exerted at some step after translational initiation. Sequence analysis by the perceptron matrices (G. D. Stormo, T. D. Schneider, L. Gold, and A. Ehrenfeucht, Nucleic Acids Res. 10:2997-3011, 1982) identified a potential ribosome binding site within the uncB reading frame preceding a five-codon reading frame which is shifted one base relative to the uncB reading frame. Elimination of this binding site by mutagenesis resulted in a four- to fivefold increase in expression of an uncB'-'lacZ fusion gene containing most of uncB. Primer extension inhibition (toeprint) analysis to measure ribosome binding demonstrated that ribosomes could form an initiation complex at this alternative start site. Two fusions of lacZ to the alternative reading frame demonstrated that this site is recognized by ribosomes in vivo. The results suggest that expression of uncB is reduced by translational frameshifting and/or a translational false start at this site within the uncB reading frame.


Subject(s)
Bacterial Proteins/biosynthesis , Bacterial Proton-Translocating ATPases , Escherichia coli Proteins , Escherichia coli/enzymology , Escherichia coli/genetics , Gene Expression Regulation, Bacterial , Introns , Operon , RNA, Messenger/chemistry , Ribosomes/metabolism , Bacterial Proteins/genetics , Base Sequence , Binding Sites , Gene Expression Regulation, Enzymologic , Molecular Sequence Data , Mutagenesis, Site-Directed , Neural Networks, Computer , Nucleic Acid Conformation , Plasmids , Protein Biosynthesis , RNA Processing, Post-Transcriptional , RNA, Messenger/metabolism , Recombinant Fusion Proteins/biosynthesis , beta-Galactosidase/biosynthesis
16.
Gene ; 215(1): 111-22, 1998 Jul 17.
Article in English | MEDLINE | ID: mdl-9666097

ABSTRACT

Mutations in the human ABCR gene have been associated with the autosomal recessive Stargardt disease (STGD), retinitis pigmentosa (RP19), and cone-rod dystrophy (CRD) and have also been found in a fraction of age-related macular degeneration (AMD) patients. The ABCR gene is a member of the ATP-binding cassette (ABC) transporter superfamily and encodes a rod photoreceptor-specific membrane protein. The cytogenetic location of the ABCR gene was refined to 1p22.3-1p22.2. The intron/exon structure was determined for the ABCR gene from overlapping genomic clones. ABCR spans over 100kb and comprises 50 exons. Intron/exon splice site sequences are presented for all exons and analyzed for information content (Ri). Nine splice site sequence variants found in STGD and AMD patients are evaluated as potential mutations. The localization of splice sites reveals a high degree of conservation between other members of the ABC1 subfamily, e.g. the mouse Abc1 gene. Analysis of the 870-bp 5' upstream of the transcription start sequence reveals multiple putative photoreceptor-specific regulatory elements including a novel retina-specific transcription factor binding site. These results will be useful in further mutational screening of the ABCR gene in various retinopathies and for determining the substrate and/or function of this photoreceptor-specific ABC transporter.


Subject(s)
ATP-Binding Cassette Transporters/genetics , Genes/genetics , Alternative Splicing/genetics , Base Sequence , Binding Sites/genetics , Conserved Sequence/genetics , DNA/chemistry , DNA/genetics , Evolution, Molecular , Exons/genetics , Humans , Introns/genetics , Molecular Sequence Data , Mutation/genetics , Promoter Regions, Genetic/genetics , RNA, Messenger/chemistry , RNA, Messenger/genetics , Sequence Analysis, DNA
17.
Nucleic Acids Res ; 25(21): 4408-15, 1997 Nov 01.
Article in English | MEDLINE | ID: mdl-9336476

ABSTRACT

A graphical method is presented for displaying how binding proteins and other macromolecules interact with individual bases of nucleotide sequences. Characters representing the sequence are either oriented normally and placed above a line indicating favorable contact, or upside-down and placed below the line indicating unfavorable contact. The positive or negative height of each letter shows the contribution of that base to the average sequence conservation of the binding site, as represented by a sequence logo. These sequence 'walkers' can be stepped along raw sequence data to visually search for binding sites. Many walkers, for the same or different proteins, can be simultaneously placed next to a sequence to create a quantitative map of a complex genetic region. One can alter the sequence to quantitatively engineer binding sites. Database anomalies can be visualized by placing a walker at the recorded positions of a binding molecule and by comparing this to locations found by scanning the nearby sequences. The sequence can also be altered to predict whether a change is a polymorphism or a mutation for the recognizer being modeled.


Subject(s)
Base Sequence/genetics , DNA-Binding Proteins/metabolism , DNA/genetics , RNA-Binding Proteins/metabolism , Software , DNA/metabolism , Databases, Factual , Mathematics , RNA/metabolism
19.
Nucleic Acids Res ; 25(24): 4994-5002, 1997 Dec 15.
Article in English | MEDLINE | ID: mdl-9396807

ABSTRACT

Originally discovered in the bacteriophage Mu DNA inversion system gin, Fis (Factor for Inversion Stimulation) regulates many genetic systems. To determine the base frequency conservation required for Fis to locate its binding sites, we collected a set of 60 experimentally defined wild-type Fis DNA binding sequences. The sequence logo for Fis binding sites showed the significance and likely kinds of base contacts, and these are consistent with available experimental data. Scanning with an information theory based weight matrix within fis, nrd, tgt/sec and gin revealed Fis sites not previously identified, but for which there are published footprinting and biochemical data. DNA mobility shift experiments showed that a site predicted to be 11 bases from the proximal Salmonella typhimurium hin site and a site predicted to be 7 bases from the proximal P1 cin site are bound by Fis in vitro. Two predicted sites separated by 11 bp found within the nrd promoter region, and one in the tgt/sec promoter, were also confirmed by gel shift analysis. A sequence in aldB previously reported to be a Fis site, for which information theory predicts no site, did not shift. These results demonstrate that information analysis is useful for predicting Fis DNA binding.


Subject(s)
Bacterial Proteins/metabolism , Carrier Proteins/metabolism , DNA-Binding Proteins/metabolism , DNA/metabolism , Escherichia coli Proteins , RNA-Binding Proteins/metabolism , RNA/metabolism , Base Sequence , Binding Sites , Carrier Proteins/genetics , DNA/chemistry , DNA-Binding Proteins/genetics , Escherichia coli/genetics , Escherichia coli/metabolism , Factor For Inversion Stimulation Protein , Integration Host Factors , Molecular Sequence Data , Nucleic Acid Conformation , Promoter Regions, Genetic , Protein Binding , RNA/chemistry , RNA-Binding Proteins/genetics
20.
J Theor Biol ; 189(4): 427-41, 1997 Dec 21.
Article in English | MEDLINE | ID: mdl-9446751

ABSTRACT

Related genetic sequences having a common function can be described by Shannon's information measure and depicted graphically by a sequence logo. Though useful for many purposes, sequence logos only show the average sequence conservation, and inferring the conservation for individual sequences is difficult. This limitation is overcome by the individual information ( R i) technique described here. The method begins by generating a weight matrix from the frequencies of each nucleotide or amino acid at each position of the aligned sequences. This matrix is then applied to the sequences themselves to determine the sequence conservation of each individual sequence. The matrix is unique because the average of these assignments is the total sequence conservation, ad there is only one way to construct such a matrix. For binding sites on polynucleotides, the weight matrix has a natural cut off that distinguishes functional sequences from other sequences. R i values are on an absolute scale measured in bits of information so the conservation of different biological functions can be compared with one another. The matrix can be used to rank-order the sequences, to search for new sequences, to compare sequences to other quantitative data such as binding energy or distance between binding sites, to distinguish mutations from polymorphisms, to design sequences of a given strength, and to detect errors in databases. The R i method has been used to identify previously undescribed but experimentally verified DNA binding sites. The individual information distribution was determined for E. coli ribosome binding sites, bacterial Fis binding sites, and human donor and acceptor splice junctions, among others. The distributions demonstrate clearly that the consensus sequence is highly unusual, and hence is a poor method to describe naturally occurring binding sites.


Subject(s)
Information Theory , Models, Genetic , Polynucleotides/genetics , Animals , Binding Sites , Conserved Sequence , Databases, Factual , Humans , Thermodynamics
SELECTION OF CITATIONS
SEARCH DETAIL
...