Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 43
Filter
Add more filters










Publication year range
1.
Theor Appl Genet ; 109(1): 10-22, 2004 Jun.
Article in English | MEDLINE | ID: mdl-15085260

ABSTRACT

Poplar has become a model system for functional genomics in woody plants. Here, we report the sequencing and annotation of the first large contiguous stretch of genomic sequence (95 kb) of poplar, corresponding to a bacterial artificial chromosome clone mapped 0.6 centiMorgan from the Melampsora larici-populina resistance locus. The annotation revealed 15 putative genetic objects, of which five were classified as hypothetical genes that were similar only with expressed sequence tags from poplar. Ten putative objects showed similarity with known genes, of which one was similar to a kinase. Three other objects corresponded to the toll/interleukin-1 receptor/nucleotide-binding site/leucine-rich repeat class of plant disease resistance genes, of which two were predicted to encode an amino terminal nuclear localization signal. Four objects were homologous to the Ty1/ copia family of class I transposable elements, one of which was designated Retropop and interrupted one of the disease resistance genes. Two other objects constituted a novel Spm-like class II transposable element, which we designated Magali.


Subject(s)
Basidiomycota , Chromosome Mapping , DNA Transposable Elements/genetics , Immunity, Innate/genetics , Plant Diseases/microbiology , Populus/genetics , Amino Acid Sequence , Base Sequence , Chromosomes, Artificial, Bacterial/genetics , Crosses, Genetic , Gene Components , Molecular Sequence Data , Plasmids/genetics , Sequence Alignment , Sequence Analysis, DNA
2.
Bioinformatics ; 17(12): 1113-22, 2001 Dec.
Article in English | MEDLINE | ID: mdl-11751219

ABSTRACT

MOTIVATION: Transcriptome analysis allows detection and clustering of genes that are coexpressed under various biological circumstances. Under the assumption that coregulated genes share cis-acting regulatory elements, it is important to investigate the upstream sequences controlling the transcription of these genes. To improve the robustness of the Gibbs sampling algorithm to noisy data sets we propose an extension of this algorithm for motif finding with a higher-order background model. RESULTS: Simulated data and real biological data sets with well-described regulatory elements are used to test the influence of the different background models on the performance of the motif detection algorithm. We show that the use of a higher-order model considerably enhances the performance of our motif finding algorithm in the presence of noisy data. For Arabidopsis thaliana, a reliable background model based on a set of carefully selected intergenic sequences was constructed. AVAILABILITY: Our implementation of the Gibbs sampler called the Motif Sampler can be used through a web interface: http://www.esat.kuleuven.ac.be/~thijs/Work/MotifSampler.html. CONTACT: gert.thijs@esat.kuleuven.ac.be; yves.moreau@esat.kuleuven.ac.be


Subject(s)
Algorithms , Computer Simulation , Models, Genetic , Probability , Promoter Regions, Genetic , Arabidopsis/genetics , DNA, Intergenic , Gene Expression , Transcription, Genetic
3.
Trends Plant Sci ; 5(9): 394-6, 2000 Sep.
Article in English | MEDLINE | ID: mdl-10973095

ABSTRACT

Naturally occurring antisense transcripts are well documented in mammals and prokaryotes but little is known about their existence and effects in plants. Generally, antisense RNAs are believed to control gene expression negatively by annealing to the complementary sequences of the sense transcript. The resulting double-stranded RNAs are thought either to affect RNA stability, transcription and/or translation directly, or to generate a signal for gene silencing and defense against viruses.


Subject(s)
Plants/genetics , RNA, Antisense/metabolism , Gene Expression Regulation, Plant , Molecular Sequence Data , Plants/metabolism , Transcription, Genetic
4.
J Biotechnol ; 78(3): 235-46, 2000 Mar 31.
Article in English | MEDLINE | ID: mdl-10751684

ABSTRACT

PPMdb is a proteome database dedicated to proteins from plant plasma membranes. It provides comprehensive two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) maps, partial amino acid sequences and expression data. All this information is gathered and structured in a relational database, after being analyzed and annotated. PPMdb includes active links to related biological databases (EMBL, GenBank, GenPep, and SWISS-PROT and TrEMBL) as well as to MEDLINE abstracts. Information on specific protein spots can be displayed by clicking on the 2-D maps. In addition, users can query the database by accession number, protein name, pI and MW, and cellular location. Access to PPMdb is available at the following URL: http://sphinx.rug. ac.be:8080.


Subject(s)
Databases, Factual , Plant Proteins/genetics , Plant Proteins/isolation & purification , Amino Acid Sequence , Biotechnology , Cell Membrane/chemistry , Electrophoresis, Gel, Two-Dimensional , Gene Expression , Genome, Plant , Membrane Proteins/genetics , Membrane Proteins/isolation & purification , Molecular Sequence Data , Peptide Mapping , Proteome/genetics , Proteome/isolation & purification , Sequence Alignment
5.
J Biotechnol ; 78(3): 293-9, 2000 Mar 31.
Article in English | MEDLINE | ID: mdl-10751690

ABSTRACT

Gene prediction methods for eukaryotic genomes still are not fully satisfying. One way to improve gene prediction accuracy, proven to be relevant for prokaryotes, is to consider more than one model of genes. Thus, we used our classification of Arabidopsis thaliana genes in two classes (CU(1) and CU(2)), previously delineated according to statistical features, in the GeneMark gene identification program. For each gene class, as well as for the two classes combined, a Markov model was developed (respectively, GM-CU(1), GM-CU(2) and GM-all) and then used on a test set of 168 genes to compare their respective efficiency. We concluded from this analysis that GM-CU(1) is more sensitive than GM-CU(2) which seems to be more specific to a gene type. Besides, GM-all does not give better results than GM-CU(1) and combining results from GM-CU(1) and GM-CU(2) greatly improve prediction efficiency in comparison with predictions made with GM-all only. Thus, this work confirms the necessity to consider more than one gene model for gene prediction in eukaryotic genomes, and to look for gene classes in order to build these models.


Subject(s)
Arabidopsis/genetics , Genes, Plant , Biotechnology , Codon/genetics , DNA, Plant/genetics , Databases, Factual , Exons , Models, Genetic , Software
6.
FEBS Lett ; 452(1-2): 3-6, 1999 Jun 04.
Article in English | MEDLINE | ID: mdl-10376667

ABSTRACT

The rapidity with which genomic sequences of the model plant Arabidopsis thaliana and soon of rice are becoming available has strongly boosted plant molecular biology research. Here, two main genomic fields will be discussed: the progress in different structural genome projects, such as mapping, sequencing, genome organization and comparative genomics, and the so-called functional genomics approaches to analyze the genome using such molecular tools as transcript profiling, micro-arrays, and insertional mutagenesis. In addition a section on bioinformatics is included.


Subject(s)
DNA, Plant/genetics , Genome, Plant , DNA, Plant/chemistry , Genes, Plant , Genetic Techniques
7.
Curr Opin Plant Biol ; 2(2): 90-5, 1999 Apr.
Article in English | MEDLINE | ID: mdl-10322203

ABSTRACT

Genome data have to be converted into knowledge to be useful to biologists. Many valuable computational tools have already been developed to help annotation of plant genome sequences, and these may be improved further, for example by identification of more gene regulatory elements. The lack of a standard computer-assisted annotation platform for eukaryotic genomes remains major bottle-neck.


Subject(s)
Genes, Plant/genetics , Genome, Plant , Arabidopsis/genetics , Databases, Factual , Genes, Plant/physiology , Internet , Sequence Alignment , Software
8.
FEBS Lett ; 445(2-3): 237-45, 1999 Feb 26.
Article in English | MEDLINE | ID: mdl-10094464

ABSTRACT

As part of the European Scientists Sequencing Arabidopsis program, a contiguous region (396607 bp) located on chromosome 4 around the APETALA2 gene was sequenced. Analysis of the sequence and comparison to public databases predicts 103 genes in this area, which represents a gene density of one gene per 3.85 kb. Almost half of the genes show no significant homology to known database entries. In addition, the first 45 kb of the contig, which covers 11 genes, is similar to a region on chromosome 2, as far as coding sequences are concerned. This observation indicates that ancient duplications of large pieces of DNA have occurred in Arabidopsis.


Subject(s)
Gene Duplication , Genes, Plant , Homeodomain Proteins/genetics , Nuclear Proteins/genetics , Plant Proteins/genetics , Arabidopsis/genetics , Arabidopsis Proteins , Base Sequence , Chromosome Mapping , Contig Mapping , DNA, Plant , Genome, Plant , Introns , Mathematical Computing , Molecular Sequence Data , Multigene Family
9.
J Mol Biol ; 285(5): 1977-91, 1999 Feb 05.
Article in English | MEDLINE | ID: mdl-9925779

ABSTRACT

While genomic sequences are accumulating, finding the location of the genes remains a major issue that can be solved only for about a half of them by homology searches. Prediction methods are thus required, but unfortunately are not fully satisfying. Most prediction methods implicitly assume a unique model for genes. This is an oversimplification as demonstrated by the possibility to group coding sequences into several classes in Escherichia coli and other genomes. As no classification existed for Arabidopsis thaliana, we classified genes according to the statistical features of their coding sequences. A clustering algorithm using a codon usage model was developed and applied to coding sequences from A. thaliana, E. coli, and a mixture of both. By using it, Arabidopsis sequences were clustered into two classes. The CU1 and CU2 classes differed essentially by the choice of pyrimidine bases at the codon silent sites: CU2 genes often use C whereas CU1 genes prefer T. This classification discriminated the Arabidopsis genes according to their expressiveness, highly expressed genes being clustered in CU2 and genes expected to have a lower expression, such as the regulatory genes, in CU1. The algorithm separated the sequences of the Escherichia-Arabidopsis mixed data set into five classes according to the species, except for one class. This mixed class contained 89 % Arabidopsis genes from CU1 and 11 % E. coli genes, mostly horizontally transferred. Interestingly, most genes encoding organelle-targeted proteins, except the photosynthetic and photoassimilatory ones, were clustered in CU1. By tailoring the GeneMark CDS prediction algorithm to the observed coding sequence classes, its quality of prediction was greatly improved. Similar improvement can be expected with other prediction systems.


Subject(s)
Arabidopsis/genetics , Codon/classification , Genes, Plant , Models, Genetic , Algorithms , Arabidopsis/classification , Cell Nucleus/genetics , Classification/methods , Exons , Gene Expression , Organelles/genetics
10.
Nucleic Acids Res ; 27(1): 295-6, 1999 Jan 01.
Article in English | MEDLINE | ID: mdl-9847207

ABSTRACT

PlantCARE is a database of plant cis- acting regulatory elements, enhancers and repressors. Besides the transcription motifs found on a sequence, it also offers a link to the EMBL entry that contains the full gene sequence as well as a description of the conditions in which a motif becomes functional. The information on these sites is given by matrices, consensus and individual site sequences on particular genes, depending on the available information. PlantCARE is a relational database available via the web at the URL: http://sphinx.rug.ac.be:8080/PlantCARE/


Subject(s)
Databases, Factual , Plants/genetics , Regulatory Sequences, Nucleic Acid/genetics , Arabidopsis/genetics , Consensus Sequence/genetics , Databases, Factual/trends , Enhancer Elements, Genetic/genetics , Gene Expression Regulation, Plant , Genes, Plant/genetics , Genome, Plant , Information Storage and Retrieval , Internet , Promoter Regions, Genetic/genetics , Response Elements/genetics , Sequence Homology, Nucleic Acid , Software
11.
Bioinformatics ; 15(11): 887-99, 1999 Nov.
Article in English | MEDLINE | ID: mdl-10743555

ABSTRACT

MOTIVATION: The annotation of the Arabidopsis thaliana genome remains a problem in terms of time and quality. To improve the annotation process, we want to choose the most appropriate tools to use inside a computer-assisted annotation platform. We therefore need evaluation of prediction programs with Arabidopsis sequences containing multiple genes. RESULTS: We have developed AraSet, a data set of contigs of validated genes, enabling the evaluation of multi-gene models for the Arabidopsis genome. Besides conventional metrics to evaluate gene prediction at the site and the exon levels, new measures were introduced for the prediction at the protein sequence level as well as for the evaluation of gene models. This evaluation method is of general interest and could apply to any new gene prediction software and to any eukaryotic genome. The GeneMark.hmm program appears to be the most accurate software at all three levels for the Arabidopsis genomic sequences. Gene modeling could be further improved by combination of prediction software. AVAILABILITY: The AraSet sequence set, the Perl programs and complementary results and notes are available at http://sphinx.rug.ac.be:8080/biocomp/napav/. CONTACT: Pierre.Rouze@gengenp.rug.ac.be.


Subject(s)
Arabidopsis/genetics , Computational Biology/methods , Genome, Plant , Sequence Analysis, DNA/methods , Software Validation , Alternative Splicing/genetics , Contig Mapping/methods , Databases, Factual , Evaluation Studies as Topic , Exons/genetics , Models, Genetic , Reproducibility of Results , Sequence Analysis, Protein
12.
Gene ; 215(1): 11-7, 1998 Jul 17.
Article in English | MEDLINE | ID: mdl-9666060

ABSTRACT

As a contribution to the European Scientists Sequencing Arabidopsis (BIOTECH ESSA) project, a contig of almost 40kb has been sequenced at the extreme top of chromosome 1, around the Arabidopsis thaliana gene coding for a member of the 1-aminocyclopropane-1-carboxylate synthesis gene family. The region contains, besides the ACS1 gene itself, 10 putative genes, all new for Arabidopsis. Among these are three genes encoding kinases, a late embryogenesis-abundant protein, a MADS box-containing protein, a dehydrogenase, and a Myb-related transcription factor. In addition, six cDNAs have been sequenced that correspond to this region.


Subject(s)
Arabidopsis/genetics , Chromosomes/genetics , DNA, Plant/genetics , Proto-Oncogene Proteins c-myb , Arabidopsis/chemistry , Arabidopsis Proteins , Chromosome Mapping , Cloning, Molecular , DNA, Plant/chemistry , DNA-Binding Proteins/genetics , Gene Expression/genetics , Genes, Plant/genetics , Genome, Plant , MADS Domain Proteins , Molecular Sequence Data , Oxidoreductases/genetics , Phosphotransferases/genetics , Plant Proteins/genetics , Sequence Analysis, DNA , Sequence Homology, Amino Acid , Transcription Factors/genetics
13.
Plant Mol Biol ; 36(2): 205-17, 1998 Jan.
Article in English | MEDLINE | ID: mdl-9484433

ABSTRACT

Three random translational beta-glucuronidase (gus) gene fusions were previously obtained in Arabidopsis thaliana, using Agrobacterium-mediated transfer of a gus coding sequence without promoter and ATG initiation site. These were analysed by IPCR amplification of the sequence upstream of gus and nucleotide sequence analysis. In one instance, the gus sequence was fused, in inverse orientation, to the nos promoter sequence of a truncated tandem T-DNA copy and translated from a spurious ATG in this sequence. In the second transgenic line, the gus gene was fused to A. thaliana DNA, 27 bp downstream an ATG. In this line, a large deletion occurred at the target site of the T-DNA. In the third line, gus is fused in frame to a plant DNA sequence after the eighth codon of an open reading frame encoding a protein of 619 amino acids. This protein has significant homology with animal and plant (receptor) serine/threonine protein kinases. The twelve subdomains essential for kinase activity are conserved. The presence of a potential signal peptide and a membrane-spanning domain suggests that it may be a receptor kinase. These data confirm that plant genes can be tagged as functional translational gene fusions.


Subject(s)
Arabidopsis/metabolism , DNA, Bacterial/metabolism , Glucuronidase/biosynthesis , Protein Biosynthesis , Protein Serine-Threonine Kinases/biosynthesis , Amino Acid Sequence , Artificial Gene Fusion , Base Sequence , Cloning, Molecular , Conserved Sequence , DNA, Plant/chemistry , DNA, Plant/metabolism , DNA, Single-Stranded/metabolism , Escherichia coli , Molecular Sequence Data , Plants, Genetically Modified , Polymerase Chain Reaction , Promoter Regions, Genetic , Protein Serine-Threonine Kinases/chemistry , Protein Serine-Threonine Kinases/genetics , Receptors, Cell Surface/biosynthesis , Receptors, Cell Surface/chemistry , Receptors, Cell Surface/genetics , Recombinant Fusion Proteins/biosynthesis , Recombinant Fusion Proteins/chemistry , Rhizobium , Sequence Alignment , Sequence Homology, Amino Acid , TATA Box , Transfection
14.
Plant J ; 16(5): 633-41, 1998 Dec.
Article in English | MEDLINE | ID: mdl-10036779

ABSTRACT

A plasma membrane (PM) fraction was purified from Arabidopsis thaliana using a standard procedure and analyzed by two-dimensional (2D) gel electrophoresis. The proteins were classified according to their relative abundance in PM or cell membrane supernatant fractions. Eighty-two of the 700 spots detected on the PM 2D gels were microsequenced. More than half showed sequence similarity to proteins of known function. Of these, all the spots in the PM-specific and PM-enriched fractions, together with half of the spots with similar abundance in PM fraction and supernatant, have previously been found at the PM, supporting the validity of this approach. Extrapolation from this analysis indicates that (i) approximately 550 polypeptides found at the PM could be resolved on 2D gels; (ii) that numerous proteins with multiple locations are found at the PM; and (iii) that approximately 80% of PM-specific spots correspond to proteins with unknown function. Among the later, half are represented by ESTs or cDNAs in databases. In this way, several unknown gene products were potentially localized to the PM. These data are discussed with respect to the efficiency of organelle proteome approaches to link systematically genomic data to genome expression. It is concluded that generalized proteomes can constitute a powerful resource, with future completion of Arabidopsis genome sequencing, for genome-wide exploration of plant function.


Subject(s)
Arabidopsis/metabolism , Cell Membrane/metabolism , Membrane Proteins/metabolism , Plant Proteins/metabolism , Amino Acid Sequence , Arabidopsis/genetics , Cell Fractionation , Electrophoresis, Gel, Two-Dimensional , Expressed Sequence Tags , Membrane Proteins/genetics , Membrane Proteins/isolation & purification , Molecular Sequence Data , Molecular Weight , Plant Proteins/genetics , Plant Proteins/isolation & purification
15.
FEBS Lett ; 416(2): 156-60, 1997 Oct 20.
Article in English | MEDLINE | ID: mdl-9369203

ABSTRACT

As part of the European Union program of European Scientist Sequencing Arabidopsis (ESSA), the DNA sequence of a 24.053-bp insert of cosmid clone CC17J13 was determined. The cosmid is located on chromosome 1 at the PFL locus (position 30 cM). Analysis of the sequence and comparison to public databases predicts seven genes in this area, thus approximately one gene every 3.3 kb. Three cDNAs corresponding to genes in this region were also sequenced. The homologies and/or possible functions of the (putative) genes are discussed. Proteins encoded by genes in this region include a polyadenylate-binding protein (PAB-3) and a GTP-binding protein (Rab7) as well as a novel protein, possibly involved in double-stranded RNA unwinding and apoptosis. Intriguingly, the gene encoding the PAB-3 protein, which is very specifically expressed, is flanked by putative matrix attachment regions.


Subject(s)
Arabidopsis/genetics , Chromosome Mapping , rab GTP-Binding Proteins , Base Sequence , DNA, Complementary , DNA, Plant/chemistry , Databases as Topic , Europe , GTP-Binding Proteins/genetics , Genome, Plant , Molecular Sequence Data , Poly(A)-Binding Proteins , RNA-Binding Proteins/genetics , Sequence Analysis, DNA , Sequence Homology, Nucleic Acid , rab7 GTP-Binding Proteins
16.
Nucleic Acids Res ; 25(15): 3159-63, 1997 Aug 01.
Article in English | MEDLINE | ID: mdl-9224618

ABSTRACT

Little knowledge exists about branch points in plants; it has even been claimed that plant introns lack conserved branch point sequences similar to those found in vertebrate introns. A putative branch point consensus sequence for Arabidopsis thaliana resembling the well known metazoan consensus sequence has been proposed, but this is based on search of sequences similar to those in yeast and metazoa. Here we present a novel consensus sequence found by a non-circular approach. A hidden Markov model with a fixed A nucleotide was trained on sequences upstream of the acceptor site. The consensus found by the Markov model shares features with the metazoan consensus, but differs in its details from the consensus proposed earlier. Despite the fact that branch point consensus sequences in plants are weak, we show that a prediction scheme incorporating them leads to a substantial improvement in the recognition of true acceptor sites; the false positive rate being reduced by a factor of 2. We take this as an indication that the consensus found here is the genuine one and that the branch point does play a role in the proper recognition of the acceptor site in plants.


Subject(s)
Arabidopsis/genetics , Consensus Sequence , DNA, Plant , Binding Sites , Models, Genetic
17.
Plant Mol Biol ; 32(6): 1003-18, 1996 Dec.
Article in English | MEDLINE | ID: mdl-9002599

ABSTRACT

The comparative analysis of a large number of plant cyclins of the A/B family has recently revealed that plants possess two distinct B-type groups and three distinct A-type groups of cyclins. Despite earlier uncertainties, this large-scale comparative analysis has allowed an unequivocal definition of plant cyclins into either A or B classes. We present here the most important results obtained in this study, and extend them to the case of plant D-type cyclins, in which three groups are identified. For each of the plant cyclin groups, consensus sequences have been established and a new, rational, plant-wide naming system is proposed in accordance with the guidelines of the Commission on Plant Gene Nomenclature. This nomenclature is based on the animal system indicating cyclin classes by an upper-case roman letter, and distinct groups within these classes by an arabic numeral suffix. The naming of plant cyclin classes is chosen to indicate homology to their closest animal class. The revised nomenclature of all described plant cyclins is presented, with their classification into groups CycA1, CycA2, CycA3, CycB1, CycB2, CycD1, CycD2 and CycD3.


Subject(s)
Cyclins , Plant Proteins , Plants/chemistry , Terminology as Topic , Amino Acid Sequence , Consensus Sequence , Cyclin D , Cyclins/chemistry , Cyclins/classification , Cyclins/genetics , Genes, Plant , Molecular Sequence Data , Phylogeny , Plant Proteins/chemistry , Plant Proteins/classification , Plant Proteins/genetics , Plants/genetics
18.
Nucleic Acids Res ; 24(17): 3439-52, 1996 Sep 01.
Article in English | MEDLINE | ID: mdl-8811101

ABSTRACT

Artificial neural networks have been combined with a rule based system to predict intron splice sites in the dicot plant Arabidopsis thaliana. A two step prediction scheme, where a global prediction of the coding potential regulates a cutoff level for a local prediction of splice sites, is refined by rules based on splice site confidence values, prediction scores, coding context and distances between potential splice sites. In this approach, the prediction of splice sites mutually affect each other in a non-local manner. The combined approach drastically reduces the large amount of false positive splice sites normally haunting splice site prediction. An analysis of the errors made by the networks in the first step of the method revealed a previously unknown feature, a frequent T-tract prolongation containing cryptic acceptor sites in the 5' end of exons. The method presented here has been compared with three other approaches, GeneFinder, Gene-Mark and Grail. Overall the method presented here is an order of magnitude better. We show that the new method is able to find a donor site in the coding sequence for the jelly fish Green Fluorescent Protein, exactly at the position that was experimentally observed in A.thaliana transformants. Predictions for alternatively spliced genes are also presented, together with examples of genes from other dicots, monocots and algae. The method has been made available through electronic mail (NetPlantGene@cbs.dtu.dk), or the WWW at http://www.cbs.dtu.dk/NetPlantGene.html


Subject(s)
Arabidopsis/genetics , Artificial Intelligence , Models, Genetic , RNA Precursors/genetics , RNA Splicing/genetics , RNA, Plant/genetics , Algorithms , DNA, Plant/genetics , Databases, Factual , Exons , Expert Systems , Forecasting , Green Fluorescent Proteins , Introns , Luminescent Proteins/genetics , Molecular Sequence Data , Neural Networks, Computer , Reproducibility of Results
19.
Nucleic Acids Res ; 24(2): 316-20, 1996 Jan 15.
Article in English | MEDLINE | ID: mdl-8628656

ABSTRACT

Data driven computational biology relies on the large quantities of genomic data stored in international sequence data banks. However, the possibilities are drastically impaired if the stored data is unreliable. During a project aiming to predict splice sites in the dicot Arabidopsis thaliana, we extracted a data set from the A.thaliana entries in GenBank. A number of simple 'sanity' checks, based on the nature of the data, revealed an alarmingly high error rate. More than 15% of the most important entries extracted did contain erroneous information. In addition, a number of entries had directly conflicting assignments of exons and introns, not stemming from alternative splicing. In a few cases the errors are due to mere typographical misprints, which may be corrected by comparison to the original papers, but errors caused by wrong assignments of splice sites from experimental data are the most common. It is proposed that the level of error correction should be increased and that gene structure sanity checks should be incorporated--also at the submitter level--to avoid or reduce the problem in the future. A non-redundant and error corrected subset of the data for A.thaliana is made available through anonymous FTP.


Subject(s)
Arabidopsis/genetics , Databases, Factual , Algorithms , Base Sequence , DNA, Plant/genetics , Genome, Plant , Introns , Molecular Sequence Data , Neural Networks, Computer , RNA Splicing/genetics
20.
Biochimie ; 78(5): 327-34, 1996.
Article in English | MEDLINE | ID: mdl-8905152

ABSTRACT

Two independent computer systems, NetPlantGene and AMELIE, dedicated to the identification of splice sites in plant and human genomes, respectively, are introduced here. Both methods were designed in relation to experimental work; they rely on automatically generated rules involving the nucleotide content of sequences regardless of the coding properties of exons. The specificity of plant sequences as considered in NetPlantGene is shown to enhance the quality of detection as opposed to general methods such as GRAIL. A scanning model of the acceptor site recognition is being simulated by AMELIE leading to a relatively accurate selection process of sites.


Subject(s)
Arabidopsis/genetics , Exons , RNA Splicing , Sequence Analysis/methods , Base Composition , Humans , RNA, Messenger/genetics , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...