Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
1.
Nucleic Acids Res ; 38(Database issue): D190-5, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19900971

ABSTRACT

The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering. We applied this procedure to 630 complete genomes (529 bacteria, 46 archaea and 55 eukaryotes), which is a 2-fold increase relative to the previous version. The pipeline yielded 224,847 OGs, including 9724 extended versions of the original COG and KOG. We computed OGs for different levels of the tree of life; in addition to the species groups included in our first release (i.e. fungi, metazoa, insects, vertebrates and mammals), we have now constructed OGs for archaea, fishes, rodents and primates. We automatically annotate the non-supervised orthologous groups (NOGs) with functional descriptions, protein domains, and functional categories as defined initially for the COG/KOG database. In-depth analysis is facilitated by precomputed high-quality multiple sequence alignments and maximum-likelihood trees for each of the available OGs. Altogether, eggNOG covers 2,242 035 proteins (built from 2,590,259 proteins) and provides a broad functional description for at least 1,966,709 (88%) of them. Users can access the complete set of orthologous groups via a web interface at: http://eggnog.embl.de.


Subject(s)
Amino Acid Motifs/genetics , Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Animals , Archaea , Computational Biology/trends , Databases, Protein , Fishes , Genome, Bacterial , Humans , Information Storage and Retrieval/methods , Internet , Primates , Protein Structure, Tertiary , Rats , Software
2.
Proc Natl Acad Sci U S A ; 104(35): 13913-8, 2007 Aug 28.
Article in English | MEDLINE | ID: mdl-17717083

ABSTRACT

To assess the potential of protein function prediction in environmental genomics data, we analyzed shotgun sequences from four diverse and complex habitats. Using homology searches as well as customized gene neighborhood methods that incorporate intergenic and evolutionary distances, we inferred specific functions for 76% of the 1.4 million predicted ORFs in these samples (83% when nonspecific functions are considered). Surprisingly, these fractions are only slightly smaller than the corresponding ones in completely sequenced genomes (83% and 86%, respectively, by using the same methodology) and considerably higher than previously thought. For as many as 75,448 ORFs (5% of the total), only neighborhood methods can assign functions, illustrated here by a previously undescribed gene associated with the well characterized heme biosynthesis operon and a potential transcription factor that might regulate a coupling between fatty acid biosynthesis and degradation. Our results further suggest that, although functions can be inferred for most proteins on earth, many functions remain to be discovered in numerous small, rare protein families.


Subject(s)
Genome, Bacterial , Genome , Genomic Library , Proteins/genetics , Animals , Biofilms , Databases, Factual , Genetic Variation , Models, Genetic , Open Reading Frames , Proteins/metabolism , Sequence Homology, Amino Acid
3.
Science ; 315(5815): 1126-30, 2007 Feb 23.
Article in English | MEDLINE | ID: mdl-17272687

ABSTRACT

The taxonomic composition of environmental communities is an important indicator of their ecology and function. We used a set of protein-coding marker genes, extracted from large-scale environmental shotgun sequencing data, to provide a more direct, quantitative, and accurate picture of community composition than that provided by traditional ribosomal RNA-based approaches depending on the polymerase chain reaction. Mapping marker genes from four diverse environmental data sets onto a reference species phylogeny shows that certain communities evolve faster than others. The method also enables determination of preferred habitats for entire microbial clades and provides evidence that such habitat preferences are often remarkably stable over time.


Subject(s)
Bacteria/classification , Ecosystem , Environmental Microbiology , Genomics , Phylogeny , Animals , Bacteria/genetics , Biological Evolution , Bone and Bones/microbiology , Genes, Bacterial , Genes, rRNA , Genetic Markers , Likelihood Functions , Mining , Seawater/microbiology , Soil Microbiology , Water Microbiology , Whales/microbiology
4.
Bioinformatics ; 18 Suppl 2: S161-71, 2002.
Article in English | MEDLINE | ID: mdl-12385999

ABSTRACT

MOTIVATION: Even for the amino acid motifs collected in the Prosite database there may be chance occurences as opposed to those occurences where the motif is involved in fold or function of a protein. With recent mathematical advances in assessing the significance of observing such a motif a particular number of times, we can now study the over- or under-representation of particular motifs in a complete genome and attempt to make functional deductions. RESULTS: We demonstrate that statistical over- or under-representation of motifs in complete proteomes may be an indicator of whether, in that organism, we are looking at chance occurrences of the motif or whether the occurrences are sufficiently numerous to suggest a systematic, and thus functionally important occurrence. This has important implications on databank annotations. AVAILABILITY: The complete dataset comprising the plotted statistics of 266 Prosite motifs on 42 proteomes is available at http://algo.inria.fr/nicodeme/proteomes/proteocomp.html. The software used to compute this data has been described by Nicodème (2000, 2001). They are available either by web access as mentioned in these articles or by direct request from Pierre Nicodème.


Subject(s)
Chromosome Mapping/methods , Databases, Protein , Models, Chemical , Proteome/analysis , Proteome/chemistry , Sequence Analysis, Protein/methods , Amino Acid Motifs , Amino Acid Sequence , Computer Simulation , Data Interpretation, Statistical , Models, Genetic , Models, Statistical , Molecular Sequence Data , Proteome/genetics , Sequence Homology, Amino Acid
5.
Curr Biol ; 11(24): 1963-8, 2001 Dec 11.
Article in English | MEDLINE | ID: mdl-11747823

ABSTRACT

The p150-Spir protein, which was discovered as a phosphorylation target of the Jun N-terminal kinase, is an essential regulator of the polarization of the Drosophila oocyte. Spir proteins are highly conserved between species and belong to the family of Wiskott-Aldrich homology region 2 (WH2) proteins involved in actin organization. The C-terminal region of Spir encodes a zinc finger structure highly homologous to FYVE motifs. A region with high homology between the Spir family proteins is located adjacent (N-terminal) to the modified FYVE domain and is designated as "Spir-box." The Spir-box has sequence similarity to a region of rabphilin-3A, which mediates interaction with the small GTPase Rab3A. Coexpression of p150-Spir and green fluorescent protein-tagged Rab GTPases in NIH 3T3 cells revealed that the Spir protein colocalized specifically with the Rab11 GTPase, which is localized at the trans-Golgi network (TGN), post-Golgi vesicles, and the recycling endosome. The distinct Spir localization pattern was dependent on the integrity of the modified FYVE finger motif and the Spir-box. Overexpression of a mouse Spir-1 dominant interfering mutant strongly inhibited the transport of the vesicular stomatitis virus G (VSV G) protein to the plasma membrane. The viral protein was arrested in membrane structures, largely colocalizing with the TGN marker TGN46. Our findings that the Spir actin organizer is targeted to intracellular membrane structures by its modified FYVE zinc finger and is involved in vesicle transport processes provide a novel link between actin organization and intracellular transport.


Subject(s)
Actins/metabolism , Drosophila Proteins , Microfilament Proteins/metabolism , 3T3 Cells , Actins/chemistry , Amino Acid Sequence , Animals , Biological Transport , Drosophila , Mice , Microfilament Proteins/chemistry , Molecular Sequence Data , Sequence Homology, Amino Acid
6.
Trends Biochem Sci ; 26(3): 145-6, 2001 Mar.
Article in English | MEDLINE | ID: mdl-11246006

ABSTRACT

Homology-based sequence analyses have revealed the presence of a novel domain (DDT) in bromodomain PHD finger transcription factors (BPTFs), chromatin remodeling factors of the BAZ-family and other putative nuclear proteins. This domain is characterized by a number of conserved aromatic and charged residues and is predicted to consist of three alpha helices. Recent studies indicate a likely DNA-binding function for the DDT domain.


Subject(s)
Chromosomes , Transcription Factors/chemistry , Amino Acid Sequence , Homeodomain Proteins/chemistry , Molecular Sequence Data , Sequence Homology, Amino Acid
8.
Nucleic Acids Res ; 28(17): 3278-88, 2000 Sep 01.
Article in English | MEDLINE | ID: mdl-10954595

ABSTRACT

Four years after the original sequence submission, we have re-annotated the genome of Mycoplasma pneumoniae to incorporate novel data. The total number of ORFss has been increased from 677 to 688 (10 new proteins were predicted in intergenic regions, two further were newly identified by mass spectrometry and one protein ORF was dismissed) and the number of RNAs from 39 to 42 genes. For 19 of the now 35 tRNAs and for six other functional RNAs the exact genome positions were re-annotated and two new tRNA(Leu) and a small 200 nt RNA were identified. Sixteen protein reading frames were extended and eight shortened. For each ORF a consistent annotation vocabulary has been introduced. Annotation reasoning, annotation categories and comparisons to other published data on M.pneumoniae functional assignments are given. Experimental evidence includes 2-dimensional gel electrophoresis in combination with mass spectrometry as well as gene expression data from this study. Compared to the original annotation, we increased the number of proteins with predicted functional features from 349 to 458. The increase includes 36 new predictions and 73 protein assignments confirmed by the published literature. Furthermore, there are 23 reductions and 30 additions with respect to the previous annotation. mRNA expression data support transcription of 184 of the functionally unassigned reading frames.


Subject(s)
Genes, Bacterial/genetics , Genome, Bacterial , Mycoplasma pneumoniae/genetics , Open Reading Frames/genetics , Amino Acid Sequence , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Computational Biology , Mass Spectrometry , Molecular Sequence Data , Mycoplasma pneumoniae/chemistry , Oligonucleotide Array Sequence Analysis , Phylogeny , RNA, Bacterial/analysis , RNA, Bacterial/genetics , RNA, Messenger/analysis , RNA, Messenger/genetics , Sequence Alignment
9.
Nat Genet ; 25(2): 201-4, 2000 Jun.
Article in English | MEDLINE | ID: mdl-10835637

ABSTRACT

Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based searches with a domain identification protocol, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.


Subject(s)
Computational Biology/methods , Expressed Sequence Tags , Proteins/genetics , Proteins/metabolism , Signal Transduction , Amino Acid Sequence , Automation , Catalytic Domain , Cloning, Molecular/methods , Databases, Factual , Genome, Human , Humans , Internet , Molecular Sequence Data , Monomeric GTP-Binding Proteins/chemistry , Monomeric GTP-Binding Proteins/genetics , Monomeric GTP-Binding Proteins/metabolism , Protein Structure, Tertiary , Proteins/chemistry , Sequence Alignment , Sequence Homology, Amino Acid , Software
11.
RNA ; 6(4): 638-50, 2000 Apr.
Article in English | MEDLINE | ID: mdl-10786854

ABSTRACT

Vertebrate TAP and its yeast ortholog Mex67p are involved in the export of messenger RNAs from the nucleus. TAP has also been implicated in the export of simian type D viral RNAs bearing the constitutive transport element (CTE). Although TAP directly interacts with CTE-bearing RNAs, the mode of interaction of TAP/Mex67p with cellular mRNAs is different from that with the CTE RNA and is likely to be mediated by protein-protein interactions. Here we show that Mex67p directly interacts with Yra1p, an essential yeast hnRNP-like protein. This interaction is evolutionarily conserved as Yra1p also interacts with TAP. Conditional expression in yeast cells implicates Yra1 p in the export of cellular mRNAs. Database searches revealed that Yra1p belongs to an evolutionarily conserved family of hnRNP-like proteins having more than one member in Mus musculus, Xenopus laevis, Caenorhabditis elegans, and Schizosaccharomyces pombe and at least one member in several species including plants. The murine members of the family directly interact with TAP. Because members of this protein family are characterized by the presence of one RNP-motif RNA-binding domain and exhibit RNA-binding activity, we called these proteins REF-bps for RNA and export factor binding proteins. Thus, Yra1p and members of the REF family of hnRNP-like proteins may facilitate the interaction of TAP/Mex67p with cellular mRNAs.


Subject(s)
Conserved Sequence/genetics , Fungal Proteins/metabolism , Hyaluronan Receptors , Membrane Glycoproteins , Nuclear Proteins/metabolism , Nucleocytoplasmic Transport Proteins , RNA, Messenger/metabolism , RNA-Binding Proteins/metabolism , Receptors, Complement/metabolism , Ribonucleoproteins/chemistry , Saccharomyces cerevisiae Proteins , Transcription Factors/metabolism , Amino Acid Sequence , Animals , Biological Transport , Carrier Proteins , Cell Nucleus/chemistry , Cell Nucleus/genetics , Cell Nucleus/metabolism , Cloning, Molecular , Cytoplasm/chemistry , Cytoplasm/genetics , Cytoplasm/metabolism , Fungal Proteins/chemistry , Fungal Proteins/genetics , Genes, Fungal , Heterogeneous-Nuclear Ribonucleoproteins , Humans , Mice , Mitochondrial Proteins , Molecular Sequence Data , Multigene Family , Nuclear Proteins/chemistry , Nuclear Proteins/genetics , Protein Binding , RNA, Messenger/genetics , RNA-Binding Proteins/chemistry , RNA-Binding Proteins/genetics , Receptors, Complement/chemistry , Recombinant Fusion Proteins/chemistry , Recombinant Fusion Proteins/genetics , Recombinant Fusion Proteins/metabolism , Ribonucleoproteins/genetics , Ribonucleoproteins/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/growth & development , Saccharomyces cerevisiae/metabolism , Sequence Alignment , Transcription Factors/chemistry , Transcription Factors/genetics
12.
EMBO Rep ; 1(1): 53-8, 2000 Jul.
Article in English | MEDLINE | ID: mdl-11256625

ABSTRACT

Vertebrate TAP is a nuclear mRNA export factor homologous to yeast Mex67p. The middle domain of TAP binds directly to p15, a protein related to the nuclear transport factor 2 (NTF2), whereas its C-terminal domain interacts with various nucleoporins, the components of the nuclear pore complex (NPC). Here, we report that the middle domain of TAP is also similar to NTF2, as well as to regions in Ras-GAP SH3 domain binding protein (G3BP) and some plant protein kinases. Based on the known three-dimensional structure of NTF2 homodimer, a heterodimerization model of TAP and p15 could be inferred. This model was confirmed by site-directed mutagenesis of residues located at the dimer interface. Furthermore, the C-terminus of TAP was found to contain a ubiquitin-associated (UBA) domain. By site-directed mutagenesis we show that a conserved loop in this domain plays an essential role in mediating TAP-nucleoporin interaction.


Subject(s)
Carrier Proteins/metabolism , Membrane Proteins/metabolism , Nuclear Proteins/chemistry , Nuclear Proteins/metabolism , Nucleocytoplasmic Transport Proteins , RNA-Binding Proteins/chemistry , Amino Acid Sequence , Animals , Carrier Proteins/genetics , Dimerization , Humans , Models, Molecular , Molecular Sequence Data , Mutagenesis, Site-Directed , Nuclear Proteins/genetics , Protein Structure, Tertiary , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Recombinant Fusion Proteins/genetics , Recombinant Fusion Proteins/metabolism , Sequence Alignment
13.
Nucleic Acids Res ; 28(1): 231-4, 2000 Jan 01.
Article in English | MEDLINE | ID: mdl-10592234

ABSTRACT

SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures (http://SMART.embl-heidelberg.de ). More than 400 domain families found in signalling, extra-cellular and chromatin-associated proteins are detectable. These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues. Each domain found in a non-redundant protein database as well as search parameters and taxonomic information are stored in a relational database system. User interfaces to this database allow searches for proteins containing specific combinations of domains in defined taxa.


Subject(s)
Database Management Systems , Internet , Sequence Alignment , Information Storage and Retrieval , Proteins/chemistry
15.
J Mol Biol ; 280(3): 323-6, 1998 Jul 17.
Article in English | MEDLINE | ID: mdl-9665839

ABSTRACT

Homology search techniques based on the iterative PSI-BLAST method in combination with various filters for low sequence complexity are applied to assign folds to all Mycoplasma genitalium proteins. The resulting procedure (implemented as a web server) is able to predict at least one domain in 37% of these proteins automatically, with an estimated accuracy higher than 98%. Taking structural features such as coiled coil or transmembrane regions aside, folds can be assigned to more than half of the globular proteins in a bacterium just by iterative sequence comparison.


Subject(s)
Bacterial Proteins/chemistry , Mycoplasma/chemistry , Protein Folding , Protein Conformation , Sequence Homology
SELECTION OF CITATIONS
SEARCH DETAIL
...