Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Proc Natl Acad Sci U S A ; 98(14): 7835-40, 2001 Jul 03.
Article in English | MEDLINE | ID: mdl-11427726

ABSTRACT

The genome of the crenarchaeon Sulfolobus solfataricus P2 contains 2,992,245 bp on a single chromosome and encodes 2,977 proteins and many RNAs. One-third of the encoded proteins have no detectable homologs in other sequenced genomes. Moreover, 40% appear to be archaeal-specific, and only 12% and 2.3% are shared exclusively with bacteria and eukarya, respectively. The genome shows a high level of plasticity with 200 diverse insertion sequence elements, many putative nonautonomous mobile elements, and evidence of integrase-mediated insertion events. There are also long clusters of regularly spaced tandem repeats. Different transfer systems are used for the uptake of inorganic and organic solutes, and a wealth of intracellular and extracellular proteases, sugar, and sulfur metabolizing enzymes are encoded, as well as enzymes of the central metabolic pathways and motility proteins. The major metabolic electron carrier is not NADH as in bacteria and eukarya but probably ferredoxin. The essential components required for DNA replication, DNA repair and recombination, the cell cycle, transcriptional initiation and translation, but not DNA folding, show a strong eukaryal character with many archaeal-specific features. The results illustrate major differences between crenarchaea and euryarchaea, especially for their DNA replication mechanism and cell cycle processes and their translational apparatus.


Subject(s)
Genome, Archaeal , Sulfolobus/genetics , Cell Cycle Proteins/genetics , DNA Replication , Molecular Sequence Data , Sequence Analysis, DNA
2.
Extremophiles ; 4(3): 175-9, 2000 Jun.
Article in English | MEDLINE | ID: mdl-10879562

ABSTRACT

The translational starts of 144 Sulfolobus solfataricus genes have been determined by database comparison. Half the genes lie inside operons and the other half are at the start of an operon or single genes. A Shine-Dalgarno sequence is found upstream of the genes inside operons, but not for the first gene in an operon or isolated genes; this indicates that two different mechanisms are used for translation initiation in S. solfataricus. A box A transcriptional signal is found for the genes starting an operon or isolated genes, but not for the genes inside an operon. The box A signal is located about 27 nt upstream of the start codon, which implies that little or no upstream sequence is available for translation initiation for this group of genes. This finding is discussed.


Subject(s)
Peptide Chain Initiation, Translational , Sulfolobus/genetics , Base Sequence , Codon, Initiator/genetics , DNA, Archaeal/genetics , Genes, Archaeal , Molecular Sequence Data , Operon , Promoter Regions, Genetic , RNA, Archaeal/genetics , RNA, Ribosomal, 16S/genetics
3.
Extremophiles ; 2(3): 305-12, 1998 Aug.
Article in English | MEDLINE | ID: mdl-9783178

ABSTRACT

The Sulfolobus solfataricus P2 genome collaborators are poised to sequence the entire 3-Mbp genome of this crenarchaeote archaeon. About 80% of the genome has been sequenced to date, with the rest of the sequence being assembled fast. In this publication we introduce the genomic sequencing and automated analysis strategy and present intial data derived from the sequence analysis. After an overview of the general sequence features, metabolic pathway studies are explained, using sugar metabolism as an example. The paper closes with an overview of repetitive elements in S. solfataricus.


Subject(s)
Genome , Sulfolobus/genetics , Base Sequence , Carbohydrate Metabolism , Chromosome Mapping , Cloning, Molecular , DNA, Archaeal/genetics , Genes, Archaeal , Phylogeny , Repetitive Sequences, Nucleic Acid , Sequence Analysis, DNA , Software , Sulfolobus/classification , Sulfolobus/metabolism
4.
Glycoconj J ; 15(2): 115-30, 1998 Feb.
Article in English | MEDLINE | ID: mdl-9557871

ABSTRACT

The specificities of the UDP-GalNAc:polypeptide Nacetylgalactosaminyltransferases which link the carbohydrate GalNAc to the side-chain of certain serine and threonine residues in mucin type glycoproteins, are presently unknown. The specificity seems to be modulated by sequence context, secondary structure and surface accessibility. The sequence context of glycosylated threonines was found to differ from that of serine, and the sites were found to cluster. Non-clustered sites had a sequence context different from that of clustered sites. Charged residues were disfavoured at position -1 and +3. A jury of artificial neural networks was trained to recognize the sequence context and surface accessibility of 299 known and verified mucin type O-glycosylation sites extracted from O-GLYCBASE. The cross-validated NetOglyc network system correctly found 83% of the glycosylated and 90% of the non-glycosylated serine and threonine residues in independent test sets, thus proving more accurate than matrix statistics and vector projection methods. Predictions of O-glycosylation sites in the envelope glycoprotein gp120 from the primate lentiviruses HIV-1, HIV-2 and SIV are presented. The most conserved O-glycosylation signals in these evolutionary-related glycoproteins were found in their first hypervariable loop, V1. However, the strain variation for HIV-1 gp120 was significant. A computer server, available through WWW or E-mail, has been developed for prediction of mucin type O-glycosylation sites in proteins based on the amino acid sequence. The server addresses are http://www.cbs.dtu.dk/services/NetOGlyc/ and netOglyc@cbs.dtu.dk.


Subject(s)
Membrane Glycoproteins , Mucins/chemistry , Mucins/metabolism , Neural Networks, Computer , Viral Envelope Proteins , Algorithms , Amino Acid Sequence , Binding Sites , Carbohydrate Conformation , Databases, Factual , Glycosylation , HIV Envelope Protein gp120/chemistry , HIV Envelope Protein gp120/metabolism , N-Acetylglucosaminyltransferases/metabolism , Protein Conformation , Reproducibility of Results , Substrate Specificity
5.
Nucleic Acids Res ; 25(15): 3159-63, 1997 Aug 01.
Article in English | MEDLINE | ID: mdl-9224618

ABSTRACT

Little knowledge exists about branch points in plants; it has even been claimed that plant introns lack conserved branch point sequences similar to those found in vertebrate introns. A putative branch point consensus sequence for Arabidopsis thaliana resembling the well known metazoan consensus sequence has been proposed, but this is based on search of sequences similar to those in yeast and metazoa. Here we present a novel consensus sequence found by a non-circular approach. A hidden Markov model with a fixed A nucleotide was trained on sequences upstream of the acceptor site. The consensus found by the Markov model shares features with the metazoan consensus, but differs in its details from the consensus proposed earlier. Despite the fact that branch point consensus sequences in plants are weak, we show that a prediction scheme incorporating them leads to a substantial improvement in the recognition of true acceptor sites; the false positive rate being reduced by a factor of 2. We take this as an indication that the consensus found here is the genuine one and that the branch point does play a role in the proper recognition of the acceptor site in plants.


Subject(s)
Arabidopsis/genetics , Consensus Sequence , DNA, Plant , Binding Sites , Models, Genetic
6.
Nucleic Acids Res ; 24(17): 3439-52, 1996 Sep 01.
Article in English | MEDLINE | ID: mdl-8811101

ABSTRACT

Artificial neural networks have been combined with a rule based system to predict intron splice sites in the dicot plant Arabidopsis thaliana. A two step prediction scheme, where a global prediction of the coding potential regulates a cutoff level for a local prediction of splice sites, is refined by rules based on splice site confidence values, prediction scores, coding context and distances between potential splice sites. In this approach, the prediction of splice sites mutually affect each other in a non-local manner. The combined approach drastically reduces the large amount of false positive splice sites normally haunting splice site prediction. An analysis of the errors made by the networks in the first step of the method revealed a previously unknown feature, a frequent T-tract prolongation containing cryptic acceptor sites in the 5' end of exons. The method presented here has been compared with three other approaches, GeneFinder, Gene-Mark and Grail. Overall the method presented here is an order of magnitude better. We show that the new method is able to find a donor site in the coding sequence for the jelly fish Green Fluorescent Protein, exactly at the position that was experimentally observed in A.thaliana transformants. Predictions for alternatively spliced genes are also presented, together with examples of genes from other dicots, monocots and algae. The method has been made available through electronic mail (NetPlantGene@cbs.dtu.dk), or the WWW at http://www.cbs.dtu.dk/NetPlantGene.html


Subject(s)
Arabidopsis/genetics , Artificial Intelligence , Models, Genetic , RNA Precursors/genetics , RNA Splicing/genetics , RNA, Plant/genetics , Algorithms , DNA, Plant/genetics , Databases, Factual , Exons , Expert Systems , Forecasting , Green Fluorescent Proteins , Introns , Luminescent Proteins/genetics , Molecular Sequence Data , Neural Networks, Computer , Reproducibility of Results
7.
Int J Neural Syst ; 6(1): 31-42, 1995 Mar.
Article in English | MEDLINE | ID: mdl-7670672

ABSTRACT

Optimal Brain Damage (OBD) and Optimal Brain Surgeon (OBS) represent two popular pruning procedures; however, pruning large networks trained on voluminous data sets using these methods easily becomes intractable. We present a number of approximations and discuss practical issues in real-world pruning, and use as an example a network trained to predict protein coding regions in DNA sequences. The efficiency of OBS on large networks is compared to OBD, and it turns out that OBD is preferable to OBS, since more weights can be removed using less computational effort.


Subject(s)
Brain Injuries/physiopathology , Brain/surgery , Neural Networks, Computer , Neurosurgery , Sequence Analysis , Base Sequence , DNA/analysis , Expert Systems , Molecular Sequence Data
8.
J Mol Biol ; 243(5): 816-20, 1994 Nov 11.
Article in English | MEDLINE | ID: mdl-7966302

ABSTRACT

A neural network trained to classify the 61 nucleotide triplets of the genetic code into 20 amino acid categories develops in its internal representation a pattern matching the relative cost of transferring amino acids with satisfied backbone hydrogen bonds from water to an environment of dielectric constant of roughly 2.0. Such environments are typically found in lipid membranes or in the interior of proteins. In learning the mapping between the codons and the categories, the network groups the amino acids according to the scale of transfer free energies developed by Engelman, Goldman and Steitz. Several other scales based on internal preference statistics also agree reasonably well with the network grouping. The network is able to relate the structure of the genetic code to quantifications of amino acid hydrophobicity-hydrophilicity more systematically than the numerous attempts made earlier. Due to its inherent non-linearity, the code is also shown to impose decisive constraints on algorithmic analysis of the protein coding potential of DNA.


Subject(s)
Amino Acids/chemistry , Energy Transfer/genetics , Neural Networks, Computer , Amino Acid Sequence , Base Sequence , Genetic Code , Models, Genetic , Molecular Sequence Data
SELECTION OF CITATIONS
SEARCH DETAIL
...