Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
Add more filters











Publication year range
1.
Comput Biol Chem ; 28(3): 211-8, 2004 Jul.
Article in English | MEDLINE | ID: mdl-15261151

ABSTRACT

Although the characterization of proteins cannot solely rely upon sequence similarity, it has been widely proved that all-vs-all massive sequence comparisons may be an effective approach and a good basis for the prediction of biochemical functions or for the delineation of common shared properties. The program Cluster-C presented here enables a stand-alone and efficient construction of protein families within whole proteomes. The algorithm, which is based on the detection of cliques, ensures a high level of connectivity within the clusters. As opposed to the single transitive linkage method, Cluster-C allows a large number of sequences to be classified in such a way that the multidomain proteins do not produce a chain-grouping effect resulting in meaningless clusters. Moreover, some proteins can be present in several different but relevant clusters, which is of help in the determination of their functional domains. In the present analysis we used the Z-value, an evaluation of the significance of the similarity score, as the criterion for connecting sequences (the user can freely define the threshold of the similarity criterion). The clusters built with a rather low threshold (Z= 14) include more than 97% of the sequences and are consistent with known protein families and PROSITE patterns.


Subject(s)
Algorithms , Sequence Alignment/methods , Amino Acid Sequence/genetics , Arabidopsis Proteins/chemistry , Arabidopsis Proteins/genetics , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Cluster Analysis , Computational Biology/methods , Databases, Protein , Fungal Proteins/chemistry , Fungal Proteins/genetics , Proteome/chemistry , Proteome/genetics
2.
Nucleic Acids Res ; 32(Database issue): D351-3, 2004 Jan 01.
Article in English | MEDLINE | ID: mdl-14681432

ABSTRACT

All the protein sequences from plants (including Arabidopsis thaliana) available from SwissProt/TrEMBL have been the subject of an all-by-all systematic comparison and grouped into clusters of related proteins. Within each cluster, the sequences have been submitted to pyramidal classification; in the case where two or several subfamilies have been grouped together, the pyramidal tree helps in finding which sequences make the links between subfamilies. In addition, the 'domains' that are common to two or more sequences within a cluster were determined and displayed à la ProDom. The resulting graphical representations proved to be quite efficient in pinpointing those protein sequences suffering from a probable error in the annotation of their genes. The clusters can be searched through various criteria and their pyramidal classifications and their domain representations can be displayed by querying http://genoplante-info. infobiogen.fr/phytoprot. The user can also launch a BLAST search of a query sequence against all the clusters.


Subject(s)
Computational Biology , Databases, Protein , Plant Proteins/classification , Proteome , Arabidopsis Proteins/chemistry , Arabidopsis Proteins/classification , Cluster Analysis , Internet , Plant Proteins/chemistry , Protein Structure, Tertiary , Proteome/chemistry , Proteome/classification , Proteomics
3.
J Comput Biol ; 8(4): 381-99, 2001.
Article in English | MEDLINE | ID: mdl-11571074

ABSTRACT

We propose and study a new approach for the analysis of families of protein sequences. This method is related to the LogDet distances used in phylogenetic reconstructions; it can be viewed as an attempt to embed these distances into a multidimensional framework. The proposed method starts by associating a Markov matrix to each pairwise alignment deduced from a given multiple alignment. The central objects under consideration here are matrix-valued logarithms L of these Markov matrices, which exist under conditions that are compatible with fairly large divergence between the sequences. These logarithms allow us to compare data from a family of aligned proteins with simple models (in particular, continuous reversible Markov models) and to test the adequacy of such models. If one neglects fluctuations arising from the finite length of sequences, any continuous reversible Markov model with a single rate matrix Q over an arbitrary tree predicts that all the observed matrices L are multiples of Q. Our method exploits this fact, without relying on any tree estimation. We test this prediction on a family of proteins encoded by the mitochondrial genome of 26 multicellular animals, which include vertebrates, arthropods, echinoderms, molluscs, and nematodes. A principal component analysis of the observed matrices L shows that a single rate model can be used as a rough approximation to the data, but that systematic deviations from any such model are unmistakable and related to the evolutionary history of the species under consideration.


Subject(s)
Computational Biology , Proteins/genetics , Sequence Alignment/statistics & numerical data , Computer Simulation , DNA, Mitochondrial/genetics , Evolution, Molecular , Markov Chains , Phylogeny , Sequence Analysis, Protein/statistics & numerical data , Stochastic Processes
4.
Genome Res ; 11(7): 1296-303, 2001 Jul.
Article in English | MEDLINE | ID: mdl-11435413

ABSTRACT

An all-by-all comparison of all the publicly available protein sequences from plants has been performed, followed by a clusterization process. Within each of the 1064 resulting clusters-containing sequences that are orthologous as well as paralogous-the sequences have been submitted to a pyramidal classification and their domains delineated by an automated procedure à la. This process provides a means for easily checking for any apparent inconsistency in a cluster, for example, whether one sequence is shorter or longer than the others, one domain is missing, etc. In such cases, the alignment of the DNA sequence of the gene with that of a close homologous protein often reveals (in 10% of the clusters) probable sequencing errors (leading to frameshifts) or probable wrong intron/exon predictions. The composition of the clusters, their pyramidal classifications, and domain decomposition, as well as our comments when appropriate, are available from http://chlora.infobiogen.fr:1234/PHYTOPROT.


Subject(s)
Genome, Plant , Multigene Family , Sequence Analysis, DNA/methods , Amino Acid Sequence , Arabidopsis/enzymology , Arabidopsis/genetics , Base Sequence , Computational Biology/methods , L-Lactate Dehydrogenase/genetics , Malate Dehydrogenase/genetics , Molecular Sequence Data , Plant Proteins/genetics , Sequence Alignment , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid
5.
J Mol Biol ; 306(4): 863-76, 2001 Mar 02.
Article in English | MEDLINE | ID: mdl-11243794

ABSTRACT

Amino acid selection by aminoacyl-tRNA synthetases requires efficient mechanisms to avoid incorrect charging of the cognate tRNAs. A proofreading mechanism prevents Escherichia coli methionyl-tRNA synthetase (EcMet-RS) from activating in vivo L-homocysteine, a natural competitor of L-methionine recognised by the enzyme. The crystal structure of the complex between EcMet-RS and L-methionine solved at 1.8 A resolution exhibits some conspicuous differences with the recently published free enzyme structure. Thus, the methionine delta-sulphur atom replaces a water molecule H-bonded to Leu13N and Tyr260O(eta) in the free enzyme. Rearrangements of aromatic residues enable the protein to form a hydrophobic pocket around the ligand side-chain. The subsequent formation of an extended water molecule network contributes to relative displacements, up to 3 A, of several domains of the protein. The structure of this complex supports a plausible mechanism for the selection of L-methionine versus L-homocysteine and suggests the possibility of information transfer between the different functional domains of the enzyme.


Subject(s)
Escherichia coli/enzymology , Methionine-tRNA Ligase/chemistry , Methionine-tRNA Ligase/metabolism , Methionine/metabolism , Allosteric Regulation , Allosteric Site , Amino Acid Sequence , Binding, Competitive , Crystallization , Crystallography, X-Ray , Homocysteine/metabolism , Hydrogen Bonding , Methionine/chemistry , Models, Molecular , Molecular Sequence Data , Protein Structure, Secondary , Protein Structure, Tertiary , Sequence Alignment , Substrate Specificity , Water/chemistry , Water/metabolism
6.
Comput Chem ; 23(3-4): 303-15, 1999 Jun 15.
Article in English | MEDLINE | ID: mdl-10404622

ABSTRACT

In conventional hierarchical clustering methods, any object can belong to only one class or cluster. We present here an application of the pyramidal classification method to biological objects, which illustrates the intuitively appealing idea that some objects may belong simultaneously to two classes. In a first step, we performed an all-by-all comparison of all the open reading frames in the genomes from S. cerevisiae, M. jannaschii, E. coli, H. influenzae and Synechocystis. In a second step, a series of connex classes was built, each connex class containing all those sequences that were linked by a Z-value (obtained after 100 sequence shufflings) greater than a given threshold. Finally, each connex class was submitted to a pyramidal classification. Three examples of such classifications are given, concerning two sets of multi-domains protein sequences and a family of aminoacyl-tRNA synthetases. They make it clear that the linear order among the classified objects that results from the pyramidal classification is useful in deciphering the multiple relationships that can exist between the objects under study. A program for calculating and displaying a pyramidal classification from a dissimilarity matrix is available from http:/(/)genome.genetique.uvsq.fr/Pyramids. The pyramidal classifications of the connex classes from the five organisms (intra- and inter-genomic comparisons) are available from http:/(/)www.gene-it.com under the family item.


Subject(s)
Cluster Analysis , Algorithms , Amino Acyl-tRNA Synthetases/genetics , Genome, Bacterial , Genome, Fungal , Models, Biological , Open Reading Frames
7.
Nucleic Acids Res ; 27(14): 2848-51, 1999 Jul 15.
Article in English | MEDLINE | ID: mdl-10390524

ABSTRACT

In spite of many efforts, the prediction of the location of proteins in eukaryotic cells (cytoplasm, mitochondrion or chloroplast) is still far from straightforward. In some cases (e.g. ribosomal proteins and aminoacyl-tRNA synthetases) both the cytoplasmic proteins and their organellar counterparts are encoded by the nuclear genome. A factorial correspondence analysis of the codon usage in yeast and Caenorhabditis elegans shows that the codon usage of those nuclear genes encoding ribosomal proteins or aminoacyl-tRNA synthetases is markedly different, depending on the final location of the proteins (cytoplasmic or mitochondrial). As a consequence, the location of such proteins-whose sequences are now frequently determined by systematic genomic sequencing-can be easily and quickly predicted. A WWW interface has been developed, aimed at providing a user-friendly tool for codon usage pattern analysis. It is available from http://www.genetique.uvsq.fr/afc.html


Subject(s)
Amino Acyl-tRNA Synthetases/metabolism , Codon/genetics , Eukaryotic Cells/metabolism , Ribosomal Proteins/metabolism , Amino Acyl-tRNA Synthetases/genetics , Animals , Arabidopsis/cytology , Arabidopsis/enzymology , Arabidopsis/genetics , Biological Transport , Caenorhabditis elegans/cytology , Caenorhabditis elegans/enzymology , Caenorhabditis elegans/genetics , Cell Nucleus/enzymology , Cell Nucleus/genetics , Cell Nucleus/metabolism , Cytoplasm/enzymology , Cytoplasm/metabolism , Eukaryotic Cells/cytology , Eukaryotic Cells/enzymology , Genes, Fungal/genetics , Genes, Helminth/genetics , Genes, Plant/genetics , Genome , Internet , Mitochondria/enzymology , Mitochondria/genetics , Mitochondria/metabolism , Open Reading Frames/genetics , Ribosomal Proteins/genetics , Saccharomyces cerevisiae/cytology , Saccharomyces cerevisiae/enzymology , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Software
8.
Genomics ; 57(3): 352-64, 1999 May 01.
Article in English | MEDLINE | ID: mdl-10329001

ABSTRACT

A cloning of hepatic cDNAs associated with the early phase of an acute, systemic inflammation was carried out by differential screening of arrayed cDNA clones from rat livers obtained at 4-8 h postchallenge with Freund's complete adjuvant. End sequencing of 174 selected clones provided three cDNA groups that coded for: (i) 23 known acute-phase proteins, (ii) 31 known proteins whose change in hepatic synthesis during an acute phase was so far unsuspected, and (iii) 36 novel proteins whose cDNAs were completely sequenced. For 16 proteins in the third group the hepatic mRNA could be detected and quantitated by Northern blot hybridization in Freund's adjuvant-challenged animals, and an extrahepatic expression in healthy animals was further investigated. Matching the open reading frames of the 36 novel proteins with general and specialized data libraries indicated the potential relationships of 16 of these proteins with known protein families/superfamilies and/or the presence of functional domains previously described in other proteins. Overall, our search for novel inflammation-associated proteins selected mostly known or as yet undescribed proteins with an intracellular or membrane location, which extends our knowledge of the proteins involved in the intracellular metabolism of hepatic cells during a systemic, acute-phase response. Finally, some of the cDNAs above allowed us to successfully identify hepatic mRNAs that are differentially expressed in acute vs chronic (polyarthritis) inflammatory conditions in rat.


Subject(s)
Acute-Phase Proteins/genetics , Inflammation/genetics , Liver/metabolism , Acute-Phase Proteins/immunology , Animals , Base Sequence , Blotting, Northern , Cloning, Molecular , DNA Probes , DNA, Complementary , Gene Expression , Genetic Markers , Inflammation/metabolism , Intracellular Fluid , Liver/immunology , Male , Molecular Sequence Data , Nucleic Acid Hybridization , RNA, Messenger , Rats , Rats, Inbred Lew , Rats, Sprague-Dawley , Sequence Analysis, DNA
9.
FEBS Lett ; 446(1): 6-8, 1999 Mar 05.
Article in English | MEDLINE | ID: mdl-10100603

ABSTRACT

Poly(ADP-ribose)polymerase is a nuclear NAD-dependent enzyme and an essential nick sensor involved in cellular processes where nicking and rejoining of DNA strands are required. The inter-alpha-inhibitor family is comprized of several plasma proteins that all harbor one or more so-called heavy chains designated H1-H4. The latter originate from precursor polypeptides H1P-H4P whose upper two thirds are highly homologous. We now describe a novel protein that includes (i) a so-called BRCT domain found in many proteins involved in DNA repair, (ii) an area that is homologous to the NAD-dependent catalytic domain of poly(ADP-ribose)polymerase, (iii) an area that is homologous to the upper two thirds of precursor polypeptides H1P-H4P and (iv) a proline-rich region with a potential nuclear localization signal. This protein now designated PH5P points to as yet unsuspected links between poly(ADP-ribose)polymerase and the inter-alpha-inhibitor family and is likely to be involved in DNA repair.


Subject(s)
Alpha-Globulins/metabolism , DNA Repair , Nuclear Proteins/metabolism , Poly(ADP-ribose) Polymerases/metabolism , Alpha-Globulins/genetics , Animals , Humans
10.
Comput Chem ; 23(3-4): 317-31, 1999 Jun 15.
Article in English | MEDLINE | ID: mdl-10627144

ABSTRACT

The Z-value is an attempt to estimate the statistical significance of a Smith-Waterman dynamic alignment score (SW-score) through the use of a Monte-Carlo process. It partly reduces the bias induced by the composition and length of the sequences. This paper is not a theoretical study on the distribution of SW-scores and Z-values. Rather, it presents a statistical analysis of Z-values on large datasets of protein sequences, leading to a law of probability that the experimental Z-values follow. First, we determine the relationships between the computed Z-value, an estimation of its variance and the number of randomizations in the Monte-Carlo process. Then, we illustrate that Z-values are less correlated to sequence lengths than SW-scores. Then we show that pairwise alignments, performed on 'quasi-real' sequences (i.e., randomly shuffled sequences of the same length and amino acid composition as the real ones) lead to Z-value distributions that statistically fit the extreme value distribution, more precisely the Gumbel distribution (global EVD, Extreme Value Distribution). However, for real protein sequences, we observe an over-representation of high Z-values. We determine first a cutoff value which separates these overestimated Z-values from those which follow the global EVD. We then show that the interesting part of the tail of distribution of Z-values can be approximated by another EVD (i.e., an EVD which differs from the global EVD) or by a Pareto law. This has been confirmed for all proteins analysed so far, whether extracted from individual genomes, or from the ensemble of five complete microbial genomes comprising altogether 16956 protein sequences.


Subject(s)
Genome, Bacterial , Genome, Fungal , Sequence Alignment , Computing Methodologies , Escherichia coli/genetics , Mathematics , Monte Carlo Method , Saccharomyces cerevisiae/genetics
11.
Biochem Biophys Res Commun ; 243(2): 522-30, 1998 Feb 13.
Article in English | MEDLINE | ID: mdl-9480842

ABSTRACT

The family of plasma proteins collectively referred to as Inter-alpha-Inhibitor (I alpha I) family is comprised of a set of multi-polypeptide molecules and a single-chain molecule designated I alpha IH4P. Although the 4 heavy chain precursors H1P to H4P that lead to these molecules are evolutionarily related, only H4P harbours a Pro-rich region (PRR) in its C-terminal third. A comparison of hepatic H4P cDNAs in human and rat has now unraveled an extensive variability of this PRR. Within the rat PRR, 6 repeats of a Gly-X-Pro motif participate in a collagen-like pattern that is absent in human. Within the human PRR, a domain that is absent in rat can be transcribed or deleted by alternative splicing which results in two variant forms of human H4P. In rat liver, the single mRNA is up-regulated by an acute, systemic inflammation whereas neither mRNA is up-regulated in human liver. Finally the shortest human mRNA is also transcribed in peripheral blood mononuclear cells where it is down-regulated by bacterial lipopolysaccharides. Therefore, in contrast to what is seen for the ITIH1 to -3 genes, the rat and human ITIH4 gene transcriptions and products thereof present marked differences, which suggests species-specific functions for I alpha IH4P.


Subject(s)
Alpha-Globulins/biosynthesis , Alpha-Globulins/chemistry , Liver/metabolism , Proline/chemistry , Alpha-Globulins/physiology , Alternative Splicing/genetics , Amino Acid Sequence , Animals , Cloning, Molecular , Gene Expression Regulation/genetics , Humans , Inflammation/metabolism , Lipopolysaccharides/pharmacology , Molecular Sequence Data , Monocytes/drug effects , RNA, Messenger/metabolism , Rats , Sequence Alignment , Sequence Analysis, DNA
12.
Mol Biol Evol ; 15(11): 1548-61, 1998 Nov.
Article in English | MEDLINE | ID: mdl-12572618

ABSTRACT

All of the aminoacyl-tRNA synthetase (aaRS) sequences currently available in the data banks have been subjected to a systematic analysis aimed at finding gene duplications, genetic recombinations, and horizontal transfers. Evidence is provided for the occurrence (or probable occurrence) of such phenomena within this class of enzymes. In particular, it is suggested that the monomeric PheRS from the yeast mitochondrion is a chimera of the alpha and beta chains of the standard tetrameric protein. In addition, it is proposed that the dimeric and tetrameric forms of GlyRS are the result of a double and independent acquisition of the same specificity within two different subclasses of aaRS. The phylogenetic reconstructions of the evolutionary histories of the genes encoding aaRS are shown to be extremely diverse. While large segments of the population are consistent with the broad grouping into the three Woesian domains, some phylogenetic reconstructions do not place the Archae and the Eucarya as sister groups but, rather, show a gram-negative bacteria/eukaryote clustering. In addition, many individual genes pose difficulties that preclude any simple evolutionary scheme. Thus, aaRS's are clearly a paradigm of F. Jacob's "odd jobs of evolution" but, on the whole, do not call into question the evolutionary scenario originally proposed by Woese and subsequently refined by others.


Subject(s)
Amino Acyl-tRNA Synthetases/genetics , Evolution, Molecular , Genes/genetics , Amino Acid Sequence , Amino Acyl-tRNA Synthetases/classification , Animals , Cattle , Cricetinae , Genes, Archaeal/genetics , Genes, Bacterial/genetics , Genes, Fungal/genetics , Genes, Helminth/genetics , Glycine-tRNA Ligase/classification , Glycine-tRNA Ligase/genetics , Humans , Mice , Mitochondrial Proteins/genetics , Molecular Sequence Data , RNA, Transfer, Amino Acid-Specific/classification , RNA, Transfer, Amino Acid-Specific/genetics , Rabbits , Sequence Alignment/methods , Species Specificity , Tryptophan-tRNA Ligase/classification , Tryptophan-tRNA Ligase/genetics , Tyrosine-tRNA Ligase/classification , Tyrosine-tRNA Ligase/genetics
13.
DNA Res ; 4(4): 257-65, 1997 Aug 31.
Article in English | MEDLINE | ID: mdl-9405933

ABSTRACT

Analysis of the codon usage of genes coding for the structural components of the outer membrane in Escherichia coli, is consistent with the requirement for high expression of these genes. Because porins (which constitute the major protein component of the outer membrane), and LPS (which constitute the major outermost constituent of the outer membrane), are synthesized from genes displaying widely different codon usage, it is possible to investigate the origin of the outer membrane. The analysis predicts that the outer membrane might originate from a genome other than the genome coding for the major part of the cell. Such a special origin would explain in structural terms, the likely lethality of porins if they were inadvertently inserted within the inner membrane, giving rise to the Gram-negative bacterial type, having an envelope comprising two membranes, instead of a single cytoplasmic membrane and a murein sacculus.


Subject(s)
Bacterial Outer Membrane Proteins/genetics , Codon , Escherichia coli/genetics , Genome, Bacterial , RNA, Transfer/genetics
14.
Biochimie ; 78(5): 311-4, 1996.
Article in English | MEDLINE | ID: mdl-8905149

ABSTRACT

A significant proportion of coding sequences or open reading frames discovered in the course of sequencing projects do not show any similarity with other sequences deposited with the protein databanks. In such cases the search for similarities must be performed with as many comparison algorithms as possible, so as to increase the chance of finding weak relationships. A specialised parallel hardware (SAMBA) implementing the Smith & Waterman algorithm has been developed at the 'Institut de Recherche en Informatique et Systèmes Aléatoìres' (IRISA). It makes it possible to scan protein databanks at a speed comparable with that of BLAST or FASTA. We report here a study performed with SAMBA on 814 orphan sequences from S cerevisiae and compare the results with those from BLAST and FASTA.


Subject(s)
DNA, Fungal/genetics , Genes, Fungal , Open Reading Frames , Sequence Homology, Amino Acid , Algorithms , Amino Acid Sequence , Molecular Sequence Data , Multigene Family
15.
J Mol Biol ; 250(2): 123-7, 1995 Jul 07.
Article in English | MEDLINE | ID: mdl-7608964

ABSTRACT

The availability of specialized sequence databanks for Escherichia coli, Saccharomyces cerevisiae and Bacillus subtilis made it possible to build a set of 105 protein-coding genes that are homologous in these three species. An analysis of the triplets at both the nucleotide and amino acid level revealed that the codon bias of some amino acids are significantly higher at conserved rather than at non-conserved positions. Comparisons of homologous genes in E. coli and Salmonella typhimurium, and in S. cerevisiae and Drosophila melanogaster, led to the same conclusion. A special case was made for serine in E. coli, whose major codon is AGC for non-conserved and TCC for conserved residues. We interpret this observation as evidence that the primordial codons for serine were TCN, while codons AGY appeared later. This conclusion is substantiated by an analysis of the codon usage of catalytic serine residues in ancient, ubiquitous and essential proteins (ATP synthases and topoisomerases). It is shown that in these proteins the proportion of the catalytic serine residues coded by TCN is significantly higher than the one expected from the overall codon usage of serine residues.


Subject(s)
Biological Evolution , Codon/genetics , Conserved Sequence/genetics , Genetic Code/genetics , Serine/genetics , Amino Acid Sequence , Bacillus subtilis/genetics , Base Sequence , Escherichia coli/genetics , Saccharomyces cerevisiae/genetics
16.
Biochem J ; 306 ( Pt 2): 505-12, 1995 Mar 01.
Article in English | MEDLINE | ID: mdl-7534067

ABSTRACT

The inter-alpha-inhibitor (I alpha I) family is comprised of the plasma protease inhibitors I alpha I, inter-alpha-like inhibitor (I alpha LI), pre-alpha-inhibitor (P alpha I) and bikunin. I alpha I, I alpha LI and P alpha I are distinct assemblies of bikunin with one of three heavy (H) chains designated H1, H2 and H3. These H chains and bikunin are respectively encoded by a set of three H genes and an alpha 1-microglobulin/bikunin precursor (AMBP) gene. All four gene products undergo maturation steps from precursor polypeptides. The full-length cDNAs for the H1-, H2- and H3-chain precursors were cloned from a mouse liver cDNA library and sequenced. Extensive searches of amino acid sequence similarities to other proteins in databanks revealed (i) a highly significant similarity of the C-terminal sequence in the three H-chain precursors to the multicopper-binding domain in the group of multicopper oxidase proteins and (ii) the presence of von Willebrand type-A domains in the mature H chains. Amino acid sequence comparisons between the three mouse H1-, H2- and H3-chain precursors and their human counterparts allowed us to appraise the timing and order of occurrence of the three H-chain genes from a shared ancestor during mammalian evolution. Owing to a multiple alignment of the six mouse and human nucleotide sequences for these H-chain precursors, a reverse transcriptase PCR assay with degenerate oligonucleotides was designed, allowing us to (i) present evidence that no mRNAs for further H genes exist in mouse liver and (ii) demonstrate a previously undescribed transcription of the H2- and H3-chain mRNAs in mouse brain, which contrasts with the expression of all four, H1, H2, H3 and AMBP, mRNAs in liver.


Subject(s)
Alpha-Globulins/genetics , Brain/metabolism , Liver/metabolism , Oxidoreductases/chemistry , Protein Precursors/genetics , Transcription, Genetic , Alpha-Globulins/chemistry , Amino Acid Sequence , Animals , Base Sequence , Brain Chemistry , DNA, Complementary/chemistry , Humans , Liver/chemistry , Mice , Molecular Sequence Data , Polymerase Chain Reaction , Protein Precursors/chemistry , RNA, Messenger/analysis , RNA-Directed DNA Polymerase , Sequence Homology , von Willebrand Factor/chemistry
17.
Biochimie ; 77(3): 194-203, 1995.
Article in English | MEDLINE | ID: mdl-7647112

ABSTRACT

The superimposable dinucleotide fold domains of MetRS, GlnRS and TyrRS define structurally equivalent amino acids which have been used to constrain the sequence alignments of the 10 class I aminoacyl-tRNA synthetases (aaRS). The conservation of those residues which have been shown to be critical in some aaRS enables to predict their location and function in the other synthetases, particularly: i) a conserved negatively-charged residue which binds the alpha-amino group of the amino acid substrate; ii) conserved residues within the inserted domain bridging the two halves of the dinucleotide-binding fold; and iii) conserved residues in the second half of the fold which bind the amino acid and ATP substrate. The alignments also indicate that the class I synthetases may be partitioned into two subgroups: a) MetRS, IleRS, LeuRS, ValRS, CysRS and ArgRS; b) GlnRS, GluRS, TyrRS and TrpRS.


Subject(s)
Amino Acyl-tRNA Synthetases/chemistry , Sequence Alignment/classification , Amino Acid Sequence , Amino Acyl-tRNA Synthetases/classification , Escherichia coli/chemistry , Escherichia coli/enzymology , Methionine-tRNA Ligase/chemistry , Models, Chemical , Molecular Sequence Data , Protein Conformation , Sequence Homology, Amino Acid
18.
Comput Appl Biosci ; 10(4): 453-4, 1994 Jul.
Article in English | MEDLINE | ID: mdl-7804879

ABSTRACT

Fast sequence databanks search algorithms generally make use of hash tables and look for exactly matching words. An increased sensitivity--at the expense of a decreased selectivity--can be attained in the case of proteins by using a reduced amino acid alphabet. We propose here an alphabet reduced to 10 symbols, that we used in modified versions of the FASTP and SCAN programs. An application to the aminoacyl-tRNA synthetases shows that this technique may be useful in detecting distant relationships between proteins.


Subject(s)
Databases, Factual , Proteins/genetics , Software , Algorithms , Amino Acid Sequence , Amino Acyl-tRNA Synthetases/genetics , Escherichia coli/enzymology , Escherichia coli/genetics , Molecular Sequence Data , Oligopeptides/genetics , Sequence Alignment/methods , Terminology as Topic
SELECTION OF CITATIONS
SEARCH DETAIL