Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 26
Filter
Add more filters










Publication year range
1.
Proc Natl Acad Sci U S A ; 93(12): 5854-9, 1996 Jun 11.
Article in English | MEDLINE | ID: mdl-8650182

ABSTRACT

Genomic similarities and contrasts are investigated in a collection of 23 bacteriophages, including phages with temperate, lytic, and parasitic life histories, with varied sequence organizations and with different hosts and with different morphologies. Comparisons use relative abundances of di-, tri-, and tetranucleotides from entire genomes. We highlight several specific findings. (i) As previously shown for cellular genomes, each viral genome has a distinctive signature of short oligonucleotide abundances that pervade the entire genome and distinguish it from other genomes. (ii) The enteric temperate double-stranded (ds) phages, like enterobacteria, exhibit significantly high relative abundances of GpC = GC and significantly low values of TA, but no such extremes exist in ds lytic phages. (iii) The tetranucleotide CTAG is of statistically low relative abundance in most phages. (iv) The DAM methylase site GATC is of statistically low relative abundance in most phages, but not in P1. This difference may relate to controls on replication (e.g., actions of the host SeqA gene product) and to MutH cleavage potential of the Escherichia coli DAM mismatch repair system. (v) The enteric temperate dsDNA phages form a coherent group: they are relatively close to each other and to their bacteria] hosts in average differences of dinucleotide relative abundance values. By contrast, the lytic dsDNA phages do not form a coherent group. This difference may come about because the temperate phages acquire more sequence characteristics of the host because they use the host replication and repair machinery, whereas the analyzed lytic phages are replicated by their own machinery. (vi) The nonenteric temperate phages with mycoplasmal and mycobacterial hosts are relatively close to their respective hosts and relatively distant from any of the enteric hosts and from the other phages. (vii) The single-stranded RNA phages have dinucleotide relative abundance values closest to those for random sequences, presumably attributable to the mutation rates of RNA phages being much greater than those of DNA phages.


Subject(s)
Bacteriophages/genetics , Genome, Viral , Base Sequence , Genetic Variation , Molecular Sequence Data , Oligodeoxyribonucleotides
2.
Proc Natl Acad Sci U S A ; 91(26): 12837-41, 1994 Dec 20.
Article in English | MEDLINE | ID: mdl-7809131

ABSTRACT

Genomic homogeneity is investigated for a broad base of DNA sequences in terms of dinucleotide relative abundance distances (abbreviated delta-distances) and of oligonucleotide compositional extremes. It is shown that delta-distances between different genomic sequences in the same species are low, only about 2 or 3 times the distance found in random DNA, and are generally smaller than the between-species delta-distances. Extremes in short oligonucleotides include underrepresentation of TpA and overrepresentation of GpC in most temperate bacteriophage sequences; underrepresentation of CTAG in most eubacterial genomes; underrepresentation of GATC in most bacteriophage; CpG suppression in vertebrates, in all animal mitochondrial genomes, and in many thermophilic bacterial sequences; and overrepresentation of GpG/CpC in all animal mitochondrial sets and chloroplast genomes. Interpretations center on DNA structures (dinucleotide stacking energies, DNA curvature and superhelicity, nucleosome organization), context-dependent mutational events, methylation effects, and processes of replication and repair.


Subject(s)
DNA/chemistry , Sequence Analysis, DNA/methods , Animals , Base Composition , DNA, Bacterial/chemistry , DNA, Fungal/chemistry , Eukaryotic Cells , Nucleic Acid Conformation , Oligodeoxyribonucleotides/chemistry
3.
Nucleic Acids Res ; 21(16): 3875-84, 1993 Aug 11.
Article in English | MEDLINE | ID: mdl-8367304

ABSTRACT

The recent sequencing of two relatively long (approximately 100 kb) contigs of E.coli presents unique opportunities for investigating heterogeneity and genomic organization of the E.coli chromosome. We have evaluated a number of common and contrasting sequence features in the two new contigs with comparisons to all available E.coli sequences (> 1.6 Mb). Our analyses include assessments of: (i) counts and distributions of restriction sites, special oligonucleotides (e.g., Chi sites, Dam and Dcm methylase targets), and other marker arrays; (ii) significant distant and close direct and inverted repeat sequences; (iii) sequence similarities between the long contigs and other E.coli sequences; (iv) characterization and identification of rare and frequent oligonucleotides; (v) compositional biases in short oligonucleotides; and (vi) position-dependent fluctuations in sequence composition. The two contigs reveal a number of distinctive features, including: a cluster of five repeat/dyad elements with very regular spacings resembling a transcription attenuator in one of the contigs; REP elements, ERICs, and other long repeats; distinction of the Chi sequence as the most frequent oligonucleotide; regions of clustering, overdispersion, and regularity of certain restriction sites and short palindromes; and comparative domains of inhomogeneities in the two long contigs. These and other features are discussed in relation to the organization of the E.coli chromosome.


Subject(s)
DNA, Bacterial/genetics , Escherichia coli/genetics , Base Sequence , Chromosomes, Bacterial , Molecular Sequence Data , Oligonucleotides , Repetitive Sequences, Nucleic Acid , Rho Factor/metabolism , Sequence Homology, Nucleic Acid , Terminator Regions, Genetic
4.
J Mol Biol ; 229(4): 833-48, 1993 Feb 20.
Article in English | MEDLINE | ID: mdl-8445651

ABSTRACT

New computer and statistical methods were used to determine significant direct and inverted repeats in the Escherichia coli contig sequence collection of aggregate 1.6 x 10(6) base-pairs. Eight groups of mostly new structural repeat identities were uncovered. Apart from the high statistical significance of these repeat sequences, there are suggestive relationships of the group matches in terms of neighboring genes, of genomic distributions, of their texts, and of their potentials for secondary structure. Four of these groups are relatively numerous, 11 to 26 members, one is in coding sequences and three are in non-coding. The coding group consists of the ATP-activated transmembrane component of a typical high-affinity protein-binding transport system. One of the non-coding groups consists of a special rho-independent transcription termination signal closely following an operon. The gene neighbors of this group often appear to be involved in some way in processing RNA or DNA. A second non-coding group has, for one or both neighboring genes, a component of a system responding to stress or starvation for some nutrient.


Subject(s)
Escherichia coli/genetics , Genome, Bacterial , Repetitive Sequences, Nucleic Acid , Algorithms , Amino Acid Sequence , Base Sequence , Chromosomes, Bacterial , DNA, Bacterial , Exons , Introns , Molecular Sequence Data
5.
Nucleic Acids Res ; 21(3): 703-11, 1993 Feb 11.
Article in English | MEDLINE | ID: mdl-8441679

ABSTRACT

With the sequencing of the first complete eukaryotic chromosome, III of yeast (YCIII) of length 315 kb, several types of questions concerning chromosomal organization and the heterogeneity of eukaryotic DNA sequences can be approached. We have undertaken extensive analysis of YCIII with the goals of: (1) discerning patterns and anomalies in the occurrences of short oligonucleotides; (2) characterizing the nature and locations of significant direct and inverted repeats; (3) delimiting regions unusually rich in particular base types (e.g., G+C, purines); and (4) analyzing the distributions of markers of interest, e.g., delta (delta) elements, ARS (autonomous replicating sequences), special oligonucleotides, close repeats and close dyad pairings, and gene sequences. YCIII reveals several distinctive sequence features, including: (i) a relative abundance of significant local and global repeats highlighting five genes containing substantial close or tandem DNA repeats; (ii) an anomalous distribution of delta elements involving two clusters and a long gap; (iii) a significantly even distribution of ARS; (iv) a relative increase in the frequency of T runs and AT iterations downstream of genes and A runs upstream of genes; and (v) two regions of complex repetitive sequences and anomalous DNA composition, 29000-31000 and 291000-295000, the latter centered at the HMRa locus. Interpretations of these findings for chromosomal organization and implications for regulation of gene expression are discussed.


Subject(s)
Chromosomes, Fungal , DNA, Fungal/genetics , Genetic Variation , Saccharomyces cerevisiae/genetics , Chromosome Mapping , Genes, Fungal , Humans , Oligodeoxyribonucleotides , Poly A , Poly T , Poly dA-dT , Repetitive Sequences, Nucleic Acid
6.
Protein Eng ; 5(8): 729-38, 1992 Dec.
Article in English | MEDLINE | ID: mdl-1287653

ABSTRACT

A comparative study of the compositional properties of various protein sets from both cellular and viral organisms is presented. Invariants and contrasts of amino acid usages have been discerned for different protein function classes and for different species using robust statistical methods based on quantile distributions and stochastic ordering relationships. In addition, a quantitative criterion to assess amino acid compositional extremes relative to a reference protein set is proposed and applied. Invariants of amino acid usage relate mainly to the central range of quantile distributions, whereas contrasts occur mainly in the tails of the distributions, especially contrasts between eukaryote and prokaryote species. Influences from genomic constraint are evident, for example, in the arginine:lysine ratios and the usage frequencies of residues encoded by G + C-rich versus A + T-rich codon types. The structurally similar amino acids, glutamate versus aspartate and phenylalanine versus tyrosine, show stochastic dominance relationships for most species protein sets favoring glutamate and phenylalanine respectively. The quantile distribution of hydrophobic amino acid usages in prokaryote data dominates the corresponding quantile distribution in human data. In contrast, glutamate, cysteine, proline and serine usages in human proteins dominate the corresponding quantile distributions in Escherichia coli. E. coli dominates human in the use of basic residues, but no dominance ordering applies to acidic residues. The discussion centers on commonalities and anomalies of the amino acid compositional spectrum in relation to species, function, cellular localization, biochemical and steric attributes, complexity of the amino acid biosynthetic pathway, amino acid relative abundances and founder effects.


Subject(s)
Amino Acids/analysis , Amino Acids/classification , Proteins/chemistry , Amino Acids/genetics , Animals , Bacterial Proteins/chemistry , Bacterial Proteins/classification , Bacterial Proteins/genetics , Base Composition , Codon , Databases, Factual , Enzymes/chemistry , Enzymes/classification , Enzymes/genetics , Fungal Proteins/chemistry , Fungal Proteins/classification , Fungal Proteins/genetics , Humans , Insecta/chemistry , Nuclear Proteins/chemistry , Nuclear Proteins/classification , Nuclear Proteins/genetics , Protein Structure, Secondary , Proteins/classification , Proteins/genetics , Viral Proteins/chemistry , Viral Proteins/classification , Viral Proteins/genetics
7.
Proc Natl Acad Sci U S A ; 89(6): 2002-6, 1992 Mar 15.
Article in English | MEDLINE | ID: mdl-1549558

ABSTRACT

We describe several protein sequence statistics designed to evaluate distinctive attributes of residue content and arrangement in primary structure. Considered are global compositional biases, local clustering of different residue types (e.g., charged residues, hydrophobic residues, Ser/Thr), long runs of charged or uncharged residues, periodic patterns, counts and distribution of homooligopeptides, and unusual spacings between particular residue types. The computer program SAPS (statistical analysis of protein sequences) calculates all the statistics for any individual protein sequence input and is available for the UNIX environment through electronic mail on request to V.B. (volker/genomic@stanford.edu).


Subject(s)
Algorithms , Amino Acid Sequence , Proteins/chemistry , Proto-Oncogene Proteins c-myc/genetics , Animals , Drosophila , Humans , Molecular Sequence Data , Proteins/genetics , Sequence Homology, Nucleic Acid
8.
J Mol Evol ; 33(6): 483-94, 1991 Dec.
Article in English | MEDLINE | ID: mdl-1663999

ABSTRACT

The genomes of human viruses herpes simplex 1 (HSV1) and varicella zoster (VZV), although similar in biology, largely concordant in gene order, and identical in many amino acid segments, differ widely in their genomic G + C (abbreviated S) content, which is high in HSV1 (68%) and low in VZV (46%). This paper analyzes several striking codon usage contrasts. The S difference in coding regions is dramatically large in codon site 3, S3, about 42%. The large difference in S3 is maintained at the same level in a subset of closely similar genes and even in corresponding identical amino acid blocks. A similar difference in S levels in silent site 1 (S1) is found in leucine and arginine. The difference in S3 levels occurs in every gene and in every multicodon amino acid form. The S difference also exists in amino acid usage, with HSV1 using significantly more codon types SSN, while VZV uses more codon types WWN (where W stands for A or T). The nonoverlapping and narrow histograms of S3 gene frequencies in both viruses suggest that the difference has arisen and been maintained by a process of selective rather than nonselective effects. This is in sharp contrast to the relatively large variance seen for highly similar genes in the human versus yeast analysis. Interpretations and hypotheses to explain the HSV1 vs VZV codon usage disparity relate to virus-host interactions, to the role of viral genes in DNA metabolism, to availability of molecular resources (molecular Gause exclusion principle), and to differences in genomic structure.


Subject(s)
Biological Evolution , Codon , Herpesvirus 3, Human/genetics , Simplexvirus/genetics , Viral Proteins/genetics , Amino Acids/genetics , Gene Frequency , Genes, Viral , Humans , Saccharomyces cerevisiae/genetics
9.
J Mol Biol ; 221(4): 1367-78, 1991 Oct 20.
Article in English | MEDLINE | ID: mdl-1942056

ABSTRACT

An efficient algorithm is described for finding matches, repeats and other word relations, allowing for errors, in large data sets of long molecular sequences. The algorithm entails hashing on fixed-size words in conjunction with the use of a linked list connecting all occurrences of the same word. The average memory and run time requirement both increase almost linearly with the total sequence length. Some results of the program's performance on a database of Escherichia coli DNA sequences are presented.


Subject(s)
Algorithms , Databases, Factual , Sequence Alignment/methods , Base Sequence , Consensus Sequence/genetics , Escherichia coli/genetics , Molecular Sequence Data , Mutation/genetics , Repetitive Sequences, Nucleic Acid/genetics
10.
J Mol Evol ; 32(6): 521-8, 1991 Jun.
Article in English | MEDLINE | ID: mdl-1908023

ABSTRACT

A measure of sequence similarity, dt, not requiring prior sequence alignment gave correct results for a variety of computer-generated model sequences without and with gaps for all degrees of substitution, s. Measure d was the squared Euclidean distance between vectors of counts of t-tuplets of characters in the two sequences. In models without gaps and without Needleman-Wunsch alignment, average d was very closely equal to twice average conventional mismatch counts, m. In these models one of each of the conditions on the Jukes-Cantor model was violated in turn: (1) both descendant lineages receive the same number of substitutions, (2) all sites are equally likely to be substituted, (3) all different replacement characters are equally likely to be chosen, and (4) all original characters are equally likely to be substituted. In Jukes-Cantor models with gaps Needleman-Wunsch alignment was necessarily performed, a procedure that generally produced incorrect values of m. For these models average d was found to be very closely equal to twice the average m estimated from the known value of s using the inverted Jukes-Cantor formula.


Subject(s)
Computer Simulation , Models, Genetic , Sequence Alignment , Algorithms , Base Composition , Base Sequence , Biological Evolution , Sequence Homology, Nucleic Acid
11.
Proc Natl Acad Sci U S A ; 88(4): 1536-40, 1991 Feb 15.
Article in English | MEDLINE | ID: mdl-1996354

ABSTRACT

Systemic lupus erythematosus and other chronic systemic autoimmune diseases are associated with circulating autoantibodies reactive with a limited set of mostly nuclear proteins. Using rigorous statistical methods we have identified segments of highly significant charge concentration in the majority of the characteristic nuclear and cytoplasmic autoantigens. Extremely long runs of charged residues, including some sequences of greater than 20 consecutive charged residues (purely acidic or mixed basic and acidic), occur in about a third of these proteins, whereas equivalent runs are found in less than 3% of other mammalian proteins. The other sequences have less extreme charge clusters, the type and location of which are often conserved between several otherwise nonsimilar antigens. We propose that supercharged surfaces render the targeted host proteins strongly immunogenic and that antinuclear antibody profiles might result from chronic exposure to intracellular contents, possibly in conjunction with crossreactive viral products. The limited number of potential systemic autoantigens may partly be due to the rarity of requisite charge properties.


Subject(s)
Autoantigens/genetics , Lupus Erythematosus, Systemic/immunology , Amino Acid Sequence , Animals , Autoantibodies/genetics , Autoantibodies/immunology , Humans , Lupus Erythematosus, Systemic/genetics , Molecular Sequence Data , Sequence Homology, Nucleic Acid
12.
J Virol ; 64(9): 4264-73, 1990 Sep.
Article in English | MEDLINE | ID: mdl-2166815

ABSTRACT

Epstein-Barr virus (EBV) has two different modes of existence: latent and productive. There are eight known genes expressed during latency (and hardly at all during the productive phase) and about 70 other ("productive") genes. It is shown that the EBV genes known to be expressed during latency display codon usage strikingly different from that of genes that are expressed during lytic growth. In particular, the percentage of S3 (G or C in codon site 3) is persistently lower (about 20%) in all latent genes than in nonlatent genes. Moreover, S3 is lower in each multicodon amino acid form. Also, the percentage of S in silent codon sites 1 of leucine and arginine is lower in latent than in nonlatent genes. The largest absolute differences in amino acid usage between latent and nonlatent genes emphasize codon types SSN and WWN (W means nucleotide A or T and N is any nucleotide). Two principal explanations to account for the EBV latent versus productive gene codon disparity are proposed. Latent genes have codon usage substantially different from that of host cell genes to minimize the deleterious consequences to the host of viral gene expression during latency. (Productive genes are not so constrained.) It is also proposed that the latency genes of EBV were acquired recently by the viral genome. Evidence and arguments for these proposals are presented.


Subject(s)
Codon/genetics , Genes, Viral , Herpesvirus 4, Human/genetics , RNA, Messenger/genetics , Amino Acid Sequence , Animals , Antigens, Viral/genetics , B-Lymphocytes , DNA Transposable Elements , Epstein-Barr Virus Nuclear Antigens , Genes , Humans , Molecular Sequence Data , Repetitive Sequences, Nucleic Acid , Sequence Homology, Nucleic Acid
14.
J Mol Evol ; 29(6): 526-37, 1989 Dec.
Article in English | MEDLINE | ID: mdl-2515299

ABSTRACT

Various measures of sequence dissimilarity have been evaluated by how well the additive least squares estimation of edges (branch lengths) of an unrooted evolutionary tree fit the observed pairwise dissimilarity measures and by how consistent the trees are for different data sets derived from the same set of sequences. This evaluation provided sensitive discrimination among dissimilarity measures and among possible trees. Dissimilarity measures not requiring prior sequence alignment did about as well as did the traditional mismatch counts requiring prior sequence alignment. Application of Jukes-Cantor correction to singlet mismatch counts worsened the results. Measures not requiring alignment had the advantage of being applicable to sequences too different to be critically alignable. Two different measures of pairwise dissimilarity not requiring alignment have been used: (1) multiplet distribution distance (MDD), the square of the Euclidean distance between vectors of the fractions of base signlets (or doublets, or triplets, or ...) in the respective sequences, and (2) complements of long words (CLW), the count of bases not occurring in significantly long common words. MDD was applicable to sequences more different than was CLW (noncoding), but the latter often gave better results where both measures were available (coding). MDD results were improved by using longer mutliplets and, if the sequences were coding, by using the larger amino acid and codon alphabets rather than the nucleotide alphabet. The additive least squares method could be used to provide a reasonable consensus of different trees for the same set of species (or related genes).


Subject(s)
Biological Evolution , Genetic Variation , Globins/genetics , Animals , Base Sequence , DNA/genetics , Humans , Information Systems
15.
J Mol Evol ; 29(6): 538-47, 1989 Dec.
Article in English | MEDLINE | ID: mdl-2515300

ABSTRACT

Three measures of sequence dissimilarity have been compared on a computer-generated model system in which substitutions in random sequences were made at randomly selected sites and the replacement character was chosen at random from the set of characters different from the original occupant of the site. The three measures were the conventional mismatch count between aligned sequences (AMC = m) and two measures not requiring prior sequence alignment. The latter two measures were the squared Euclidean distance between vectors of counts of t-tuples (t = 1-6) of characters in the two sequences (multiplet distribution distances or MDD = d) and counts of characters not covered by word structures of statistically significant length common to the two sequences (common long words or CLW = SIB, SIS, or SAB). Average MDD distances were found to be two times average mismatch counts in the simulated sequences for all values of t from 1 to 6 and all degrees of substitution from one per sequence to so many as to produce, effectively, random sequences. This simple relation held independently of sequence length and of sequence composition. The relation was confirmed by exact results on small model systems and by formal asymptotic results in the limit of so few substitutions that no double hits occur and in the limit of two random sequences. The coefficient of variation for MDD distances was greater than that for mismatch counts for singlets but both measures approached the same low value for sextets. Needleman-Wunsch alignment produced incorrect mismatch counts at higher degrees of substitution. The model satisfied the conditions for the derivation of the Jukes-Cantor asymptotic adjustment, but its application produced increasingly bad results with increasing degrees of substitution in accord with earlier results on model and natural sequences. This fact was a consequence of the increase with increasing degrees of substitution of the sensitivity of the adjustment to error in the observations. Average CLW distances for a variety of common word structures were more or less parallel to MDD distances for appropriately long t-tuples. These results on model systems supported the validity of the two dissimilarity measures not requiring sequence alignment that was found in earlier work on natural sequences (Blaisdell 1989).


Subject(s)
Computer Simulation , Models, Molecular , Base Sequence , Biological Evolution , DNA/genetics , Genetic Variation
16.
J Mol Biol ; 205(1): 165-77, 1989 Jan 05.
Article in English | MEDLINE | ID: mdl-2538622

ABSTRACT

Charge interactions are of great importance for protein function and structure, and for a variety of cellular and biochemical processes. We present a systematic approach to the detection of distinctive clusters, runs and periodic patterns of charged residues in a protein sequence. Criteria and formulae are set forth to assess statistical significance of these charge configurations. For the 80-odd proteins potentially encoded by the Epstein-Barr virus, only the major nuclear antigens of the latent state and the transactivator of the lytic cycle contain separated charge clusters of opposite sign as well as periodic charge patterns. From our studies of the polypeptides of the human herpesviruses and of a broad collection of human and other viral protein sequences, distinctive charge configurations appear to be associated with viral capsid and core proteins (positive clusters or runs, mostly at the carboxyl terminus), with many viral glycoproteins and membrane-associated proteins (negative charge clusters), and with transactivators and transforming proteins (multiple charge structures). The statistics developed in this paper apply more generally to other than charge properties of a protein and should aid in the evaluation of a large variety of sequence features.


Subject(s)
Herpesviridae , Peptides , Viral Proteins , Cytomegalovirus , DNA, Viral , Herpesvirus 3, Human , Herpesvirus 4, Human , Humans , Simplexvirus
17.
Proc Natl Acad Sci U S A ; 85(18): 6637-41, 1988 Sep.
Article in English | MEDLINE | ID: mdl-2842782

ABSTRACT

The protein products of several open reading frames (ORFs) of the Epstein-Barr virus (EBV) are remarkable in their distribution of charged residues. The nuclear antigen proteins EBNA1-EBNA4 of the EBV latent state contain separate significant clusters of charge of each sign. They (excepting EBNA4) also feature distinctive periodic charge patterns [e.g., (+, O)8, (O, -, -)7] and significant tandem repeats. None of the other ORFs (about 80) of the genome possess the conjunction of these properties. Only the protein encoded from BMLF1, the first immediate early transactivator protein, contains significant multiple charge clusters and periodic charge patterns. All proteins that contain significant repeats also contain at least one significant charge cluster of a single sign. These include EBNA5 and LYDMA produced during latency and BZLF1, whose expression terminates latency and initiates productive growth. It is reasonable to conclude that these aggregate significant charge configurations and repeats are important functionally for the latent existence and for the initiation of the lytic cycle and may be characteristic of these conditions. We discuss how large multimeric protein structures bound together by clusters of unlike charge may provide a mechanism for regulation of the expression of these proteins.


Subject(s)
Antigens, Viral/analysis , Herpesvirus 4, Human/analysis , Viral Proteins/analysis , Amino Acid Sequence , Base Sequence , DNA, Viral/analysis , Epstein-Barr Virus Nuclear Antigens , Molecular Sequence Data , Protein Conformation , Repetitive Sequences, Nucleic Acid
18.
J Mol Evol ; 25(3): 215-29, 1987.
Article in English | MEDLINE | ID: mdl-2822936

ABSTRACT

This paper presents an analysis of the repeat units of the ori-P region of the Epstein-Barr virus (EBV) genome. These repeat units are well-conserved palindromes. The pattern of these repeats, their lengths, phases, and the distribution of the relatively few substitutions are explained by a scenario that gives a reasonable course for the evolutionary development of the pattern. The scenario suggests a model for the production of an initiating 3/2 palindrome from a moderately lengthy sequence. The palindromic units are then multiplied in judicious combinations by mechanisms of unequal crossing-over events associated with some point substitutions and a few instances of slippage replication. The potential secondary structures of the two separated tandem palindromic repeat regions in ori-P are contrasted. Possible modes of binding of Epstein-Barr nuclear antigen (EBNA) 1 protein to these hairpins are discussed. A number of possibilities for the origin and development of the ori-P region in relation to viral and cellular function are considered.


Subject(s)
Biological Evolution , Genes, Viral , Herpesvirus 4, Human/genetics , Models, Genetic , Base Sequence , Nucleic Acid Conformation , Repetitive Sequences, Nucleic Acid
19.
Proc Natl Acad Sci U S A ; 83(14): 5155-9, 1986 Jul.
Article in English | MEDLINE | ID: mdl-3460087

ABSTRACT

Determination of first- and second-order Markov chain homogeneity of sets of nuclear eukaryotic DNA sequences, both coding and noncoding, finds similarities imperceptible to the standard Needleman-Wunsch base matching or dot-matrix algorithms. These measures of the similarities of the distributions of adjacent pairs or triplets are in agreement with accepted evolutionary-tree topologies. Hierarchical clustering of the distributions of doublets of 30 miscellaneous coding sequences gives clusters in reasonable agreement with accepted biological classifications. In addition to similarity by homology, there is also observed similarity of disparate genes in the same organism--for example, all three disparate yeast genes (two enzymes and actin) form a well-distinguished cluster.


Subject(s)
Base Sequence , Animals , Biological Evolution , DNA/genetics , Humans , Markov Chains , Mathematics
20.
Proc Natl Acad Sci U S A ; 82(15): 5185-9, 1985 Aug.
Article in English | MEDLINE | ID: mdl-3860853

ABSTRACT

A study of the effect of different amounts of L-ascorbic acid (vitamin C), between 0.076% and 8.3%, contained in the food has been carried out with ten groups of RIII mice (seven ascorbic acid and three control groups), with 50 mice in each group. With an increase in the amount of ascorbic acid there is a highly significant decrease in the first-order rate constant for appearance of the first spontaneous mammary tumor after the lag time to detection by palpation. There is also an increase in the lag time. The mean body weight and mean food intake were not significantly different for the seven ascorbic acid groups. Striking differences were observed between the 0.076% ascorbic acid and the control groups (which synthesize the vitamin): smaller food intake, decreased lag time, and increased rate constant of appearance of the first mammary tumor. This comparison cannot be made experimentally for guinea pigs and primates because the control groups would develop scurvy.


Subject(s)
Ascorbic Acid/therapeutic use , Mammary Neoplasms, Experimental/prevention & control , Animals , Body Weight , Diet , Energy Intake , Female , Mammary Neoplasms, Experimental/pathology , Mice , Mite Infestations/complications , Statistics as Topic
SELECTION OF CITATIONS
SEARCH DETAIL
...