ABSTRACT
Spinal muscular atrophy (SMA) is a common autosomal recessive disorder that results in the degeneration of spinal motor neurons. SMA is caused by alterations of the survival motor neuron ( SMN ) gene which encodes a novel protein of hitherto unclear function. The SMN protein associates with ribonucleoprotein particles involved in RNA processing and exhibits an RNA-binding capacity. We have isolated the zebrafish Danio rerio and nematode Caenorhabditis elegans orthologues and have found that the RNA-binding capacity is conserved across species. Purified recombinant SMN proteins from both species showed selectivity to poly(G) homopolymer RNA in vitro, similar to that of the human protein. Studying deletions of the zebrafish SMN protein, we defined an RNA-binding element in exon 2a, which is highly conserved across species, and revealed that its binding activity is modulated by protein domains encoded by exon 2b and exon 3. Finally, the deleted recombinant zebrafish protein mimicking an SMA frameshift mutation showed a dramatic change in vitro in the formation of the RNA-protein complexes. These observations indicate that the RNA-binding capacity of SMN is an evolutionarily conserved function and further support the view that defects in RNA metabolism most likely account for the pathogenesis of SMA.
Subject(s)
Autoantigens/genetics , RNA-Binding Proteins/genetics , Zebrafish/genetics , Amino Acid Sequence , Animals , Autoantigens/metabolism , Conserved Sequence , Evolution, Molecular , Frameshift Mutation , Humans , Molecular Sequence Data , RNA-Binding Proteins/metabolism , Ribonucleoproteins, Small Nuclear/genetics , Ribonucleoproteins, Small Nuclear/metabolism , Sequence Deletion , Species Specificity , snRNP Core ProteinsABSTRACT
We have isolated a new gene encoding a putative 103-kDa protein from the hyperthermophilic archaeon Sulfolobus acidocaldarius. Analysis of the deduced amino-acid sequence shows an extended central domain, predicted to form coiled-coil structures, and two terminal domains that display purine NTPase motifs. These features are reminiscent of mechanochemical motor proteins which use the energy of ATP hydrolysis to move specific cellular components. Comparative analysis of the amino-acid sequence of the terminal domains and predicted structural organization of this putative purine NTPase show that it is related both to eucaryal proteins from the "SMC family" involved in the condensation of chromosomes and to several bacterial and eucaryal proteins involved in DNA recombination/repair. Further analyses revealed that these proteins are all members of the so called "UvrA-related NTP-binding proteins superfamily" and form a large subgroup of motor-like NTPases involved in different DNA processing mechanisms. The presence of such protein in Archaea, Bacteria, and Eucarya suggests an early origin of DNA-motor proteins that could have emerged and diversified by domain shuffling.
Subject(s)
Acid Anhydride Hydrolases/genetics , Bacterial Proteins/genetics , DNA-Binding Proteins/genetics , Sulfolobus/genetics , Amino Acid Sequence , Cloning, Molecular , Fungal Proteins/genetics , Genes, Bacterial , Molecular Sequence Data , Multigene Family , Nucleoside-Triphosphatase , Protein Conformation , Sequence Alignment , Sequence Homology, Amino Acid , Species SpecificityABSTRACT
The systematic sequencing of the yeast genome reveals the presence of many potential genes of unknown function. One way to approach their function is to define which regulatory system controls their transcription. This can also be accomplished by the detection of an upstream activation sequence (UAS). Such a detection can be done by computer, provided that the definition of a UAS includes sufficient and precise rules. We have established such rules for the UASs of the GAL4, RAP1 (RPG box), GCN4, and the HAP2/HAP3/HAP4 regulatory proteins, as well as for a motif (PAC) frequently found upstream of the genes of the RNA polymerase A and C subunits. These rules were applied to the chromosome III DNA sequence, and gave precise predictions.
Subject(s)
Chromosomes, Fungal/genetics , Gene Expression Regulation, Fungal , Saccharomyces cerevisiae/genetics , Base Sequence , Chromosome Mapping , DNA, Fungal/analysis , Databases, Factual , Transcription, GeneticABSTRACT
We present here a codification structure, entirely interfaced with the main packages for biomolecule database management, associated with a new search algorithm to retrieve quickly a sequence in a database. This system is derived from a method previously proposed for homology search in databanks with a preprocessed codification of an entire database in which all the overlapping subsequences of a specific length in a sequence were converted into a code and stored in a hash-coding file. This new algorithm is designed for an improved use of the codification. It is based on the recognition of the rarest strings which characterize the query sequence and the intersection of sorted lists read in the codification structure. The system is applicable to both nucleic acid and protein sequences and is used to find patterns in databanks or large sets of sequences. A few examples of applications are given. In addition, the comparison of our method with existing ones shows that this new approach speeds up the search for query patterns in large data sets.
Subject(s)
Algorithms , Databases, Factual , Nucleic Acids/genetics , Proteins/genetics , Amino Acid Sequence , Animals , Base Sequence , Evaluation Studies as Topic , Humans , Information Storage and Retrieval , Molecular Sequence DataABSTRACT
The systematic sequencing of the yeast genome reveals the presence of many potential genes of unknown function. One way to approach their function is to define which regulatory system controls their transcription. This can also be accomplished by the detection of an upstream activation sequence (UAS). Such a detection can be done by computer, provided that the definition of a UAS includes sufficient and precise rules. We have established such rules for the UASs of the GAL4, RAP1 (RPG box), GCN4, and the HAP2/HAP3/HAP4 regulatory proteins, as well as for a motif (PAC) frequently found upstream of the genes of the RNA polymerase A and C subunits. These rules were applied to the chromosome III DNA sequence, and gave precise predictions.
Subject(s)
CCAAT-Binding Factor , Chromosomes, Fungal , DNA-Binding Proteins , Genes, Fungal , Saccharomyces cerevisiae Proteins , Saccharomyces cerevisiae/genetics , Base Sequence , Consensus Sequence , DNA, Fungal/genetics , Fungal Proteins/genetics , Gene Expression Regulation, Fungal , Molecular Sequence Data , Open Reading Frames , Protein Kinases/genetics , Ribosomal Proteins/genetics , Transcription Factors/geneticsSubject(s)
Amino Acid Sequence , Nucleic Acid Conformation , Online Systems , France , Local Area NetworksABSTRACT
We propose a new method for homology search of nucleic acids or proteins in databanks. All the possible subsequences of a specific length in a sequence are converted into a code and stored in an indexed file (hash-coding). This preliminary work of codifying an entire bank is rather long but it enables an immediate access to all the sequence fragments of a given type. With our method a strict homology pattern of twenty nucleotides can be found for example in the Los Alamos bank (GENBANK) in less than 2 seconds. We can also use this data storage to considerably speed up the non-strict homology search programs and to write a program to help in the selection of nucleic acid hybridization probes.