Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Publication year range
1.
Nucleic Acids Res ; 29(1): 255-9, 2001 Jan 01.
Article in English | MEDLINE | ID: mdl-11125105

ABSTRACT

A database (SpliceDB) of known mammalian splice site sequences has been developed. We extracted 43 337 splice pairs from mammalian divisions of the gene-centered Infogene database, including sites from incomplete or alternatively spliced genes. Known EST sequences supported 22 815 of them. After discarding sequences with putative errors and ambiguous location of splice junctions the verified dataset includes 22 489 entries. Of these, 98.71% contain canonical GT-AG junctions (22 199 entries) and 0.56% have non-canonical GC-AG splice site pairs. The remainder (0.73%) occurs in a lot of small groups (with a maximum size of 0.05%). We especially studied non-canonical splice sites, which comprise 3.73% of GenBank annotated splice pairs. EST alignments allowed us to verify only the exonic part of splice sites. To check the conservative dinucleotides we compared sequences of human non-canonical splice sites with sequences from the high throughput genome sequencing project (HTG). Out of 171 human non-canonical and EST-supported splice pairs, 156 (91.23%) had a clear match in the human HTG. They can be classified after sequence analysis as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors corrected to AT-AC), one case was produced from a non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two other cases left of supported non-canonical splice pairs. The information about verified splice site sequences for canonical and non-canonical sites is presented in SpliceDB with the supporting evidence. We also built weight matrices for the major splice groups, which can be incorporated into gene prediction programs. SpliceDB is available at the computational genomic Web server of the Sanger Centre: http://genomic.sanger.ac. uk/spldb/SpliceDB.html and at http://www.softberry. com/spldb/SpliceDB.html.


Subject(s)
Databases, Factual , RNA Splicing/genetics , Animals , Base Sequence , Exons , Expressed Sequence Tags , Genes/genetics , Humans , Internet , Introns
2.
Nucleic Acids Res ; 28(21): 4364-75, 2000 Nov 01.
Article in English | MEDLINE | ID: mdl-11058137

ABSTRACT

A set of 43 337 splice junction pairs was extracted from mammalian GenBank annotated genes. Expressed sequence tag (EST) sequences support 22 489 of them. Of these, 98.71% contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively; 0.56% hold non-canonical GC-AG splice site pairs; and the remaining 0.73% occurs in a lot of small groups (with a maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only eight observed types of splice site pairs (out of 256 a priori possible combinations). EST alignments allow us to verify the exonic part of the splice sites, but many non-canonical cases may be due to intron sequencing errors. This idea is given substantial support when we compare the sequences of human genes having non-canonical splice sites deposited in GenBank by high throughput genome sequencing projects (HTG). A high proportion (156 out of 171) of the human non-canonical and EST-supported splice site sequences had a clear match in the human HTG. They can be classified after corrections as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors that were corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors that corrected to AT-AC), one case was produced from non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two cases left of supported non-canonical splice sites. If we assume that approximately the same situation is true for the whole set of annotated mammalian non-canonical splice sites, then the 99.24% of splice site pairs should be GT-AG, 0.69% GC-AG, 0.05% AT-AC and finally only 0.02% could consist of other types of non-canonical splice sites. We analyze several characteristics of EST-verified splice sites and build weight matrices for the major groups, which can be incorporated into gene prediction programs. We also present a set of EST-verified canonical splice sites larger by two orders of magnitude than the current one (22 199 entries versus approximately 600) and finally, a set of 290 EST-supported non-canonical splice sites. Both sets should be significant for future investigations of the splicing mechanism.


Subject(s)
Computational Biology , Consensus Sequence/genetics , Genome , RNA Splice Sites/genetics , Animals , Base Sequence , Conserved Sequence/genetics , Databases as Topic , Exons/genetics , Expressed Sequence Tags , Humans , Introns/genetics , RNA Splicing/genetics , Reproducibility of Results , Software
3.
Genetika ; 31(12): 1614-29, 1995 Dec.
Article in Russian | MEDLINE | ID: mdl-8601507

ABSTRACT

Transposons of gypsy group are assigned to LTR-containing retrotransposons present in the genomes of invertebrates, fungi, and plants. In this work, a theoretical analysis of the potential products of ORFs of these retrotranposons was conducted. Alignments were obtained and trees of similarity were constructed for domains of the POL region. On the basis of the obtained data, two hypothetically monophyletic subgroups of transposons were distinguished within the framework of the gypsy group, settling the genomes of taxonomically related organisms (the subgroup of "true" gypsy of insects and the subgroup of gypsy-like transposons of plants and fungi). A number of peculiarities of the topology of these trees hypothetically indicate cases of genetic conversion and recombination of domains accompanying the evolution of this group. The amino acid substitution fixation rate was evaluated on the basis of comparison of sequences of the protein products of ORFs. Estimates of the time of divergence of subgroups of gypsy-group transposons are significantly less than estimates of the times of divergence of their host species. One explanation for this discrepancy might be the hypothesis of settlement by transposons of the genomes of isolated host species.


Subject(s)
Gene Products, pol/genetics , Phylogeny , Retroelements , Amino Acid Sequence , Animals , Fungi/genetics , Molecular Sequence Data , Plants/genetics , Repetitive Sequences, Nucleic Acid , Retroviridae/genetics , Ribonuclease H/genetics , Sequence Homology, Amino Acid
5.
Biochim Biophys Acta ; 1095(2): 114-6, 1991 Oct 26.
Article in English | MEDLINE | ID: mdl-1932132

ABSTRACT

Using an original computer method to search for potential DNA binding sites for glucocorticoid-receptor complexes (GRC) (Seledtsov, I.A., Solovjev, V.V. and Merkulova, T.I. (1991) Biochim. Biophys. Acta 1089, 367-376), the presence of two such sites in the 5' flanking region of a rat cytochrome CYP2B2 gene has been predicted. This prediction has been confirmed by gel retardation experiments.


Subject(s)
Cytochrome P-450 Enzyme System/genetics , DNA/metabolism , Receptors, Glucocorticoid/genetics , Animals , Base Sequence , Binding Sites/genetics , Chromatography, Gel , Consensus Sequence , Information Systems , Molecular Sequence Data , Plasmids/genetics , Rats , Receptors, Glucocorticoid/metabolism , Regulatory Sequences, Nucleic Acid
6.
Biochim Biophys Acta ; 1089(3): 367-76, 1991 Jul 23.
Article in English | MEDLINE | ID: mdl-1859840

ABSTRACT

The structure of the DNA regions recognized by glucocorticoid-receptor complexes (GIRC) was analyzed using frequency matrices and a modified perceptron method. Some complementary conservative elements which may modulate the efficiency of GIRC binding were found at both sides of the previously established conserved nucleotide sequence (core) (Beato, M. et al. (1987) J. Steroid Biochem. 27, 9-14). A criterion based on the concurrent use of several perceptron matrices to search for the potential GIRC binding site sequences has been worked out. By applying this criterion 73 sites were identified in 28 sequences of glucocorticoid regulated genes and 7 sites were identified in 26 sequences independent from glucocorticoid regulation.


Subject(s)
DNA/metabolism , Receptors, Glucocorticoid/metabolism , Amino Acid Sequence , Animals , Base Sequence , Binding Sites , Consensus Sequence , Gene Expression Regulation , Genetic Techniques , Hormones/metabolism , Humans , Molecular Sequence Data
8.
Mol Biol (Mosk) ; 24(3): 716-28, 1990.
Article in Russian | MEDLINE | ID: mdl-2402237

ABSTRACT

An analysis of the structure of DNA sites responsible for binding to glucocorticoid-receptor complex (GlRC) was carried out. The use of the frequency matrices and of a variant of the perception method made it possible to establish that in the GlRC binding site on both sides of the known conservative nucleotide sequence (nucleus) there were additional conservative elements which seemed to be able to modulate the efficiency of GlRC binding. A criterion is worked out for detecting the potential GlRC binding sites in given sequences. It is based on the simultaneous use of several perceptron matrices. The efficiency of detection of GlRC binding sites by means of the proposed criterion is by an order higher than that performed according to the GlRC binding site consensus (Beato et al. [2]).


Subject(s)
DNA/genetics , Glucocorticoids/physiology , Receptors, Glucocorticoid/genetics , Animals , Base Sequence , Binding Sites , Humans , Mathematics , Molecular Sequence Data , Receptors, Glucocorticoid/metabolism , Species Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...