Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinformatics ; 17(2): 149-54, 2001 Feb.
Article in English | MEDLINE | ID: mdl-11238070

ABSTRACT

MOTIVATION: Traditional sequence distances require an alignment and therefore are not directly applicable to the problem of whole genome phylogeny where events such as rearrangements make full length alignments impossible. We present a sequence distance that works on unaligned sequences using the information theoretical concept of Kolmogorov complexity and a program to estimate this distance. RESULTS: We establish the mathematical foundations of our distance and illustrate its use by constructing a phylogeny of the Eutherian orders using complete unaligned mitochondrial genomes. This phylogeny is consistent with the commonly accepted one for the Eutherians. A second, larger mammalian dataset is also analyzed, yielding a phylogeny generally consistent with the commonly accepted one for the mammals. AVAILABILITY: The program to estimate our sequence distance, is available at http://www.cs.cityu.edu.hk/~cssamk/gencomp/GenCompress1.htm. The distance matrices used to generate our phylogenies are available at http://www.math.uwaterloo.ca/~mli/distance.html.


Subject(s)
DNA, Mitochondrial/analysis , Sequence Alignment , Software , Animals , DNA, Mitochondrial/classification , Humans , Mathematical Computing , Phylogeny , Rodentia/genetics , Sequence Alignment/methods , Sequence Analysis, DNA
2.
Mol Biol Evol ; 16(4): 512-24, 1999 Apr.
Article in English | MEDLINE | ID: mdl-10331277

ABSTRACT

Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).


Subject(s)
Sequence Analysis, DNA/methods , Software , Algorithms , Base Sequence , Codon/genetics , DNA, Bacterial/genetics , Databases, Factual , Evaluation Studies as Topic , Mannheimia haemolytica/genetics , Salmonella typhimurium/genetics , Sequence Analysis, DNA/statistics & numerical data
3.
Proc Natl Acad Sci U S A ; 96(7): 3578-83, 1999 Mar 30.
Article in English | MEDLINE | ID: mdl-10097079

ABSTRACT

The genome sequence of the extremely thermophilic archaeon Methanococcus jannaschii provides a wealth of data on proteins from a thermophile. In this paper, sequences of 115 proteins from M. jannaschii are compared with their homologs from mesophilic Methanococcus species. Although the growth temperatures of the mesophiles are about 50 degrees C below that of M. jannaschii, their genomic G+C contents are nearly identical. The properties most correlated with the proteins of the thermophile include higher residue volume, higher residue hydrophobicity, more charged amino acids (especially Glu, Arg, and Lys), and fewer uncharged polar residues (Ser, Thr, Asn, and Gln). These are recurring themes, with all trends applying to 83-92% of the proteins for which complete sequences were available. Nearly all of the amino acid replacements most significantly correlated with the temperature change are the same relatively conservative changes observed in all proteins, but in the case of the mesophile/thermophile comparison there is a directional bias. We identify 26 specific pairs of amino acids with a statistically significant (P < 0.01) preferred direction of replacement.


Subject(s)
Bacterial Proteins/chemistry , Methanococcus/genetics , Methanococcus/metabolism , Acclimatization , Amino Acid Sequence , Amino Acid Substitution , Bacterial Proteins/genetics , Molecular Sequence Data , Protein Conformation , Species Specificity , Temperature
4.
Nature ; 390(6658): 364-70, 1997 Nov 27.
Article in English | MEDLINE | ID: mdl-9389475

ABSTRACT

Archaeoglobus fulgidus is the first sulphur-metabolizing organism to have its genome sequence determined. Its genome of 2,178,400 base pairs contains 2,436 open reading frames (ORFs). The information processing systems and the biosynthetic pathways for essential components (nucleotides, amino acids and cofactors) have extensive correlation with their counterparts in the archaeon Methanococcus jannaschii. The genomes of these two Archaea indicate dramatic differences in the way these organisms sense their environment, perform regulatory and transport functions, and gain energy. In contrast to M. jannaschii, A. fulgidus has fewer restriction-modification systems, and none of its genes appears to contain inteins. A quarter (651 ORFs) of the A. fulgidus genome encodes functionally uncharacterized yet conserved proteins, two-thirds of which are shared with M. jannaschii (428 ORFs). Another quarter of the genome encodes new proteins indicating substantial archaeal gene diversity.


Subject(s)
Archaeoglobus fulgidus/genetics , Genes, Archaeal , Genome , Archaeoglobus fulgidus/metabolism , Archaeoglobus fulgidus/physiology , Base Sequence , Cell Division , DNA, Bacterial/genetics , Energy Metabolism , Gene Expression Regulation, Bacterial , Molecular Sequence Data , Protein Biosynthesis , Transcription, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...