Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
Add more filters










Publication year range
1.
Methods Mol Biol ; 2259: 309-322, 2021.
Article in English | MEDLINE | ID: mdl-33687724

ABSTRACT

In recent years, mass spectrometry-based proteomics approach has made significant progress and the number of datasets related to various proteomics projects has increased worldwide. To promote the sharing and reuse of promising datasets, it is important to build an appropriate, high-quality public data repository. For this purpose, several repositories have already been created. The jPOST repository that we developed in 2016 has successfully implemented several unique features, such as fast file upload, flexible file management, and an easy-to-use interface. In addition, this repository is an official member of the ProteomeXchange Consortium established to facilitate standard data submission and global dissemination of mass spectrometry proteomics data. Our repository contributes to the global partnership for sharing and storing all the datasets related to various proteomics experiments.


Subject(s)
Databases, Protein , Proteins/analysis , Proteomics/methods , Animals , Humans , Mass Spectrometry/methods , Software
2.
Genome Biol ; 22(1): 9, 2021 01 04.
Article in English | MEDLINE | ID: mdl-33397462

ABSTRACT

BACKGROUND: Long-read sequencing of full-length cDNAs enables the detection of structures of aberrant splicing isoforms in cancer cells. These isoforms are occasionally translated, presented by HLA molecules, and recognized as neoantigens. This study used a long-read sequencer (MinION) to construct a comprehensive catalog of aberrant splicing isoforms in non-small-cell lung cancers, by which novel isoforms and potential neoantigens are identified. RESULTS: Full-length cDNA sequencing is performed using 22 cell lines, and a total of 2021 novel splicing isoforms are identified. The protein expression of some of these isoforms is then validated by proteome analysis. Ablations of a nonsense-mediated mRNA decay (NMD) factor, UPF1, and a splicing factor, SF3B1, are found to increase the proportion of aberrant transcripts. NetMHC evaluation of the binding affinities to each type of HLA molecule reveals that some of the isoforms potentially generate neoantigen candidates. We also identify aberrant splicing isoforms in seven non-small-cell lung cancer specimens. An enzyme-linked immune absorbent spot assay indicates that approximately half the peptide candidates have the potential to activate T cell responses through their interaction with HLA molecules. Finally, we estimate the number of isoforms in The Cancer Genome Atlas (TCGA) datasets by referring to the constructed catalog and found that disruption of NMD factors is significantly correlated with the number of splicing isoforms found in the TCGA-Lung Adenocarcinoma data collection. CONCLUSIONS: Our results indicate that long-read sequencing of full-length cDNAs is essential for the precise identification of aberrant transcript structures in cancer cells.


Subject(s)
Carcinoma, Non-Small-Cell Lung/genetics , Lung Neoplasms/genetics , Protein Isoforms/genetics , RNA Splicing , Transcriptome , Cell Line, Tumor , DNA, Complementary , Gene Expression Profiling , Humans , Nonsense Mediated mRNA Decay , Phosphoproteins/genetics , RNA Helicases/genetics , RNA Splicing Factors/genetics , Trans-Activators/genetics
3.
Nucleic Acids Res ; 47(D1): D1218-D1224, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30295851

ABSTRACT

Rapid progress is being made in mass spectrometry (MS)-based proteomics, yielding an increasing number of larger datasets with higher quality and higher throughput. To integrate proteomics datasets generated from various projects and institutions, we launched a project named jPOST (Japan ProteOme STandard Repository/Database, https://jpostdb.org/) in 2015. Its proteomics data repository, jPOSTrepo, began operations in 2016 and has accepted more than 10 TB of MS-based proteomics datasets in the past two years. In addition, we have developed a new proteomics database named jPOSTdb in which the published raw datasets in jPOSTrepo are reanalyzed using standardized protocol. jPOSTdb provides viewers showing the frequency of detected post-translational modifications, the co-occurrence of phosphorylation sites on a peptide and peptide sharing among proteoforms. jPOSTdb also provides basic statistical analysis tools to compare proteomics datasets.


Subject(s)
Computational Biology/methods , Databases, Protein , Proteome/metabolism , Proteomics/methods , Data Management/methods , Humans , Information Storage and Retrieval/methods , Internet , Japan , Mass Spectrometry/methods , Protein Processing, Post-Translational , User-Computer Interface
4.
Nucleic Acids Res ; 45(D1): D1107-D1111, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899654

ABSTRACT

Major advancements have recently been made in mass spectrometry-based proteomics, yielding an increasing number of datasets from various proteomics projects worldwide. In order to facilitate the sharing and reuse of promising datasets, it is important to construct appropriate, high-quality public data repositories. jPOSTrepo (https://repository.jpostdb.org/) has successfully implemented several unique features, including high-speed file uploading, flexible file management and easy-to-use interfaces. This repository has been launched as a public repository containing various proteomic datasets and is available for researchers worldwide. In addition, our repository has joined the ProteomeXchange consortium, which includes the most popular public repositories such as PRIDE in Europe for MS/MS datasets and PASSEL for SRM datasets in the USA. Later MassIVE was introduced in the USA and accepted into the ProteomeXchange, as was our repository in July 2016, providing important datasets from Asia/Oceania. Accordingly, this repository thus contributes to a global alliance to share and store all datasets from a wide variety of proteomics experiments. Thus, the repository is expected to become a major repository, particularly for data collected in the Asia/Oceania region.


Subject(s)
Databases, Protein , Proteome , Proteomics , Search Engine , Computational Biology/methods , Humans , Mass Spectrometry , Proteomics/methods , Software , Web Browser
5.
J Pharm Biomed Anal ; 112: 116-25, 2015 Aug 10.
Article in English | MEDLINE | ID: mdl-25978494

ABSTRACT

Human basic fetoprotein (BFP), found in fetal serum and tissue extracts as well as in extracts of various cancer tissues, has long been known as a marker protein for cancers; however, the primary sequence has not yet been reported. This paper describes the identification of BFP using the N- and C-terminal amino acid sequence tags (Ac-AALTRDPQFQ and QQREARVQ, respectively) clarified by mass spectrometry-based methods, and a terminal tag database (ProteinCarta). In this study, BFP was identified as glucose-6-phosphate isomerase (G6PI_HUMAN).


Subject(s)
Fetal Proteins/chemistry , Glucose-6-Phosphate/chemistry , Isomerases/chemistry , Amino Acid Sequence , Databases, Protein , Humans , Molecular Sequence Data , Sequence Analysis, Protein/methods
6.
Bioinformatics ; 31(13): 2217-9, 2015 Jul 01.
Article in English | MEDLINE | ID: mdl-25712693

ABSTRACT

UNLABELLED: Tandem mass spectrometry (MS/MS or MS(n)) is a potent technique for characterizing N-glycan structures. GlycanAnalysis searches a glycan database to support the identification of glycan structures from MS/MS spectra. It also calculates diagnostic ions of glycan structures registered in a glycan database (GlycomeDB or KEGG GLYCAN) and searches for MS/MS spectra of N-glycans that match diagnostic ions to determine the structures. This program functions as a plug-in for Mass++, a freeware mass spectrum visualization and analysis program. AVAILABILITY AND IMPLEMENTATION: The executable files of Mass++ are available for free at http://www.first-ms3d.jp/english/. The GlycanAnalysis plug-in is included in the standard package of Mass++ for Windows. CONTACT: k-morimt@shimadzu.co.jp or nishikaz@shimadzu.co.jp or acyshzw@shimadzu.co.jp SUPPLEMENTARY INFORMATION: Supplementary material are available at Bioinformatics online.


Subject(s)
Databases, Factual , Glycopeptides/analysis , Polysaccharides/analysis , Search Engine , Software , Tandem Mass Spectrometry/methods , Glycopeptides/chemistry , Glycosylation , Humans , Mass Spectrometry , Polysaccharides/chemistry , Proteomics/methods
7.
J Proteome Res ; 14(2): 756-67, 2015 Feb 06.
Article in English | MEDLINE | ID: mdl-25393771

ABSTRACT

In 1998, Wilkins et al. (J. Mol. Biol. 1998, 278, 599-608) reported high specificity in terminal regions (terminal tags) of 15 519 proteins from five organisms and proposed a methodology for identifying proteins by terminal tags. However, their examined sequence data were not based on complete genome sequences. Here, we examined current proteome data (217 249 entries from UniProt 2013_6 complete/reference proteome for nine organisms including human) in terms of the specificity of terminal tags and their computational annotation. One example from the results indicated that the specificity of N-terminal tags plateaued at 28% at a length of six residues for human; even when using both N- and C-terminal tags, specificity was merely 66%. In order to determine the cause of these low specificities, the annotation of proteins sharing terminal tags with other proteins was examined. The results suggested that a large majority were phylogenetically or functionally related, whereas nonrelated proteins sharing terminal tags made up less than 1% of human proteome data. On the basis of these findings, we constructed the terminal tag sequence database ProteinCarta (http://ms3d.jp/software/proteincarta/), which includes all terminal tags of proteomes from the nine organisms analyzed here, in order to confirm the specificity of terminal tags and to identify the parent protein.


Subject(s)
Proteins/chemistry , Amino Acid Sequence , Animals , Humans , Proteome
8.
BMC Bioinformatics ; 15: 376, 2014 Nov 25.
Article in English | MEDLINE | ID: mdl-25420746

ABSTRACT

BACKGROUND: Label-free quantitation of mass spectrometric data is one of the simplest and least expensive methods for differential expression profiling of proteins and metabolites. The need for high accuracy and performance computational label-free quantitation methods is still high in the biomarker and drug discovery research field. However, recent most advanced types of LC-MS generate huge amounts of analytical data with high scan speed, high accuracy and resolution, which is often impossible to interpret manually. Moreover, there are still issues to be improved for recent label-free methods, such as how to reduce false positive/negatives of the candidate peaks, how to expand scalability and how to enhance and automate data processing. AB3D (A simple label-free quantitation algorithm for Biomarker Discovery in Diagnostics and Drug discovery using LC-MS) has addressed these issues and has the capability to perform label-free quantitation using MS1 for proteomics study. RESULTS: We developed an algorithm called AB3D, a label free peak detection and quantitative algorithm using MS1 spectral data. To test our algorithm, practical applications of AB3D for LC-MS data sets were evaluated using 3 datasets. Comparisons were then carried out between widely used software tools such as MZmine 2, MSight, SuperHirn, OpenMS and our algorithm AB3D, using the same LC-MS datasets. All quantitative results were confirmed manually, and we found that AB3D could properly identify and quantify known peptides with fewer false positives and false negatives compared to four other existing software tools using either the standard peptide mixture or the real complex biological samples of Bartonella quintana (strain JK31). Moreover, AB3D showed the best reliability by comparing the variability between two technical replicates using a complex peptide mixture of HeLa and BSA samples. For performance, the AB3D algorithm is about 1.2 - 15 times faster than the four other existing software tools. CONCLUSIONS: AB3D is a simple and fast algorithm for label-free quantitation using MS1 mass spectrometry data for large scale LC-MS data analysis with higher true positive and reasonable false positive rates. Furthermore, AB3D demonstrated the best reproducibility and is about 1.2- 15 times faster than those of existing 4 software tools.


Subject(s)
Algorithms , Chromatography, Liquid/methods , Databases, Protein , Mass Spectrometry/methods , Peptide Fragments/analysis , Proteins/analysis , Proteome/analysis , Software , Animals , Cattle , HeLa Cells , Humans , Proteomics/methods , Serum Albumin, Bovine/analysis
9.
J Proteome Res ; 13(8): 3846-3853, 2014 Aug 01.
Article in English | MEDLINE | ID: mdl-24965016

ABSTRACT

We have developed Mass++, a plug-in style visualization and analysis tool for mass spectrometry. Its plug-in style enables users to customize it and to develop original functions. Mass++ has several kinds of plug-ins, including rich viewers and analysis methods for proteomics and metabolomics. Plug-ins for supporting vendors' raw data are currently available; hence, Mass++ can read several data formats. Mass++ is both a desktop tool and a software development platform. Original functions can be developed without editing the Mass++ source code. Here, we present this tool's capability to rapidly analyze MS data and develop functions by providing examples of label-free quantitation and implementing plug-ins or scripts. Mass++ is freely available at http://www.first-ms3d.jp/english/ .

10.
Mol Cell Proteomics ; 12(5): 1377-94, 2013 May.
Article in English | MEDLINE | ID: mdl-23358504

ABSTRACT

Neurofibromatosis type 1 (NF1) tumor suppressor gene product, neurofibromin, functions in part as a Ras-GAP, and though its loss is implicated in the neuronal abnormality of NF1 patients, its precise cellular function remains unclear. To study the molecular mechanism of NF1 pathogenesis, we prepared NF1 gene knockdown (KD) PC12 cells, as a NF1 disease model, and analyzed their molecular (gene and protein) expression profiles with a unique integrated proteomics approach, comprising iTRAQ, 2D-DIGE, and DNA microarrays, using an integrated protein and gene expression analysis chart (iPEACH). In NF1-KD PC12 cells showing abnormal neuronal differentiation after NGF treatment, of 3198 molecules quantitatively identified and listed in iPEACH, 97 molecules continuously up- or down-regulated over time were extracted. Pathway and network analysis further revealed overrepresentation of calcium signaling and transcriptional regulation by glucocorticoid receptor (GR) in the up-regulated protein set, whereas nerve system development was overrepresented in the down-regulated protein set. The novel up-regulated network we discovered, "dynein IC2-GR-COX-1 signaling," was then examined in NF1-KD cells. Validation studies confirmed that NF1 knockdown induces altered splicing and phosphorylation patterns of dynein IC2 isomers, up-regulation and accumulation of nuclear GR, and increased COX-1 expression in NGF-treated cells. Moreover, the neurite retraction phenotype observed in NF1-KD cells was significantly recovered by knockdown of the dynein IC2-C isoform and COX-1. In addition, dynein IC2 siRNA significantly inhibited nuclear translocation and accumulation of GR and up-regulation of COX-1 expression. These results suggest that dynein IC2 up-regulates GR nuclear translocation and accumulation, and subsequently causes increased COX-1 expression, in this NF1 disease model. Our integrated proteomics strategy, which combines multiple approaches, demonstrates that NF1-related neural abnormalities are, in part, caused by up-regulation of dynein IC2-GR-COX-1 signaling, which may be a novel therapeutic target for NF1.


Subject(s)
Cyclooxygenase 1/metabolism , Cytoplasmic Dyneins/metabolism , Membrane Proteins/metabolism , Receptors, Glucocorticoid/metabolism , Signal Transduction , Active Transport, Cell Nucleus , Animals , Cyclooxygenase 1/genetics , Cytoplasmic Dyneins/genetics , Gene Regulatory Networks , Membrane Proteins/genetics , Nerve Growth Factor/physiology , Neurites/metabolism , Neurofibromatosis 1/metabolism , Neurofibromin 1/genetics , Neurofibromin 1/metabolism , Oligonucleotide Array Sequence Analysis , PC12 Cells , Phosphorylation , Protein Processing, Post-Translational , Proteome/genetics , Proteome/metabolism , Proteomics , RNA Splicing , Rats , Receptors, Glucocorticoid/genetics , Transcriptome , Up-Regulation
11.
Nucleic Acids Res ; 41(Database issue): D353-7, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23193276

ABSTRACT

The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and 129 archaea). The OCs were constructed by applying the quasi-clique-based clustering method to all possible protein coding genes in all complete genomes, based on their amino acid sequence similarities. It is computationally efficient to calculate OCs, which enables to regularly update the contents. KEGG OC has the following two features: (i) It consists of all complete genomes of a wide variety of organisms from three domains of life, and the number of organisms is the largest among the existing databases; and (ii) It is compatible with the KEGG database by sharing the same sets of genes and identifiers, which leads to seamless integration of OCs with useful components in KEGG such as biological pathways, pathway modules, functional hierarchy, diseases and drugs. The KEGG OC resources are accessible via OC Viewer that provides an interactive visualization of OCs at different taxonomic levels.


Subject(s)
Databases, Genetic , Genes, Archaeal , Genes, Bacterial , Genes , Algorithms , Classification/methods , Cluster Analysis , Eukaryota/genetics , Genome, Archaeal , Genome, Bacterial , Genomics/methods , Internet , Sequence Homology, Amino Acid
12.
Nucleic Acids Res ; 39(Database issue): D552-5, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21051344

ABSTRACT

ODB (Operon DataBase) aims to collect data of all known and conserved operons in completely sequenced genomes. Three newly updated features of this database have been added as follows: (i) Data from included operons were updated. The genome-wide analysis of transcription and transcriptional units has become popular recently and ODB successfully integrates these high-throughput operon data, including genome-wide transcriptional units of five prokaryotes and two eukaryotes. The current version of our database contains information from about 10,000 known operons in more than 50 genomes, and more than 400,000 conserved operons obtained from more than 1000 bacterial genomes. (ii) ODB proposes the idea of reference operons as a new operon prediction tool. A reference operon, a set of possible orthologous genes that organize operons, is defined by clustering all known operons. A large number of known operons, including the recently added genome-wide analysis of operons, allowed us to define more reliable reference operons. (iii) ODB also provides new graphical interfaces. One is for comparative analyses of operon structures in multiple genomes. The other is for visualization of possible operons in multiple genomes obtained from the reference operons. The 2011 updated version of ODB is now available at http://operondb.jp/.


Subject(s)
Databases, Nucleic Acid , Operon , Genomics , User-Computer Interface
13.
Nucleic Acids Res ; 39(Database issue): D807-14, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21071393

ABSTRACT

The Ciona intestinalis protein database (CIPRO) is an integrated protein database for the tunicate species C. intestinalis. The database is unique in two respects: first, because of its phylogenetic position, Ciona is suitable model for understanding vertebrate evolution; and second, the database includes original large-scale transcriptomic and proteomic data. Ciona intestinalis has also been a favorite of developmental biologists. Therefore, large amounts of data exist on its development and morphology, along with a recent genome sequence and gene expression data. The CIPRO database is aimed at collecting those published data as well as providing unique information from unpublished experimental data, such as 3D expression profiling, 2D-PAGE and mass spectrometry-based large-scale analyses at various developmental stages, curated annotation data and various bioinformatic data, to facilitate research in diverse areas, including developmental, comparative and evolutionary biology. For medical and evolutionary research, homologs in humans and major model organisms are intentionally included. The current database is based on a recently developed KH model containing 36,034 unique sequences, but for higher usability it covers 89,683 all known and predicted proteins from all gene models for this species. Of these sequences, more than 10,000 proteins have been manually annotated. Furthermore, to establish a community-supported protein database, these annotations are open to evaluation by users through the CIPRO website. CIPRO 2.5 is freely accessible at http://cipro.ibio.jp/2.5.


Subject(s)
Ciona intestinalis/metabolism , Databases, Protein , Proteome/metabolism , Amino Acid Sequence , Animals , Ciona intestinalis/genetics , Ciona intestinalis/growth & development , Computational Biology , Computer Graphics , Gene Expression Profiling , Genomics , Molecular Sequence Annotation , Proteome/chemistry , Proteome/genetics , Proteomics , Systems Integration , User-Computer Interface
14.
J Biol Chem ; 285(24): 18684-92, 2010 Jun 11.
Article in English | MEDLINE | ID: mdl-20375021

ABSTRACT

Aminomethyltransferase, a component of the glycine cleavage system termed T-protein, reversibly catalyzes the degradation of the aminomethyl moiety of glycine attached to the lipoate cofactor of H-protein, resulting in the production of ammonia, 5,10-methylenetetrahydrofolate, and dihydrolipoate-bearing H-protein in the presence of tetrahydrofolate. Several mutations in the human T-protein gene are known to cause nonketotic hyperglycinemia. Here, we report the crystal structure of Escherichia coli T-protein in complex with dihydrolipoate-bearing H-protein and 5-methyltetrahydrofolate, a complex mimicking the ternary complex in the reverse reaction. The structure of the complex shows a highly interacting intermolecular interface limited to a small area and the protein-bound dihydrolipoyllysine arm inserted into the active site cavity of the T-protein. Invariant Arg(292) of the T-protein is essential for complex assembly. The structure also provides novel insights in understanding the disease-causing mutations, in addition to the disease-related impairment in the cofactor-enzyme interactions reported previously. Furthermore, structural and mutational analyses suggest that the reversible transfer of the methylene group between the lipoate and tetrahydrofolate should proceed through the electron relay-assisted iminium intermediate formation.


Subject(s)
Aminomethyltransferase/chemistry , Bacterial Proteins/chemistry , DNA-Binding Proteins/chemistry , Mutation , Arginine/chemistry , Catalysis , Catalytic Domain , Crystallography, X-Ray/methods , DNA Mutational Analysis , Dimerization , Escherichia coli/metabolism , Folic Acid/chemistry , Glycine/chemistry , Hyperglycemia/metabolism , Imines/chemistry , Models, Molecular
15.
Proc Natl Acad Sci U S A ; 106(19): 7921-6, 2009 May 12.
Article in English | MEDLINE | ID: mdl-19416882

ABSTRACT

Regulation of age-related changes in gene expression underlies many diseases. We previously discovered the first puberty-onset gene switch, the age-related stability element (ASE)/age-related increase element (AIE)-mediated genetic mechanism for age-related gene regulation. Here, we report that this mechanism underlies the mysterious puberty-onset amelioration of abnormal bleeding seen in hemophilia B Leyden. Transgenic mice robustly mimicking the Leyden phenotype were constructed. Analysis of these animals indicated that ASE plays a central role in the puberty-onset amelioration of the disease. Human factor IX expression in these animals was reproducibly nullified by hypophysectomy, but nearly fully restored by administration of growth hormone, being consistent with the observed sex-independent recovery of factor IX expression. Ets1 was identified as the specific liver nuclear protein binding only to the functional ASE, G/CAGGAAG, and not to other Ets consensus elements. This study demonstrates the clinical relevance of the first discovered puberty-onset gene switch, the ASE/AIE-mediated regulatory mechanism.


Subject(s)
Aging , Factor IX/genetics , Hemophilia A/genetics , Hemophilia A/therapy , Homeostasis , Animals , Female , Gene Expression Regulation , Growth Hormone/metabolism , Humans , Male , Mice , Mice, Transgenic , Protein Binding , Sex Factors , Time Factors
16.
Carbohydr Res ; 344(7): 881-7, 2009 May 12.
Article in English | MEDLINE | ID: mdl-19327755

ABSTRACT

Glycosyltransferases comprise highly divergent groups of enzymes, which play a central role in the synthesis of complex glycans. Because the repertoire of glycosyltransferases in the genome determines the range of synthesizable glycans, and because the increasing amount of genome sequence data is now available, it is essential to examine these enzymes across organisms to explore possible structures and functions of the glycoconjugates. In this study, we systematically investigated 36 eukaryotic genomes and obtained 3426 glycosyltransferase homologs for biosynthesis of major glycans, classified into 53 families based on sequence similarity. The families were further grouped into six functional categories based on the biosynthetic pathways, which revealed characteristic patterns among organism groups in the degree of conservation and in the number of paralogs. The results also revealed a strong correlation between the number of glycosyltransferases and the number of coding genes in each genome. We then predicted the ability to synthesize major glycan structures including N-glycan precursors and GPI-anchors in each organism from the combination of the glycosyltransferase families. This indicates that not only parasitic protists but also some algae are likely to synthesize smaller structures than the structures known to be conserved among a wide range of eukaryotes. Finally we discuss the functions of two large families, sialyltransferases and beta 4-glycosyltransferases, by performing finer classifications into subfamilies. Our findings suggest that universality and diversity of glycans originate from two types of evolution of glycosyltransferase families, namely conserved families with few paralogs and diverged families with many paralogs.


Subject(s)
Eukaryotic Cells/enzymology , Eukaryotic Cells/metabolism , Genome/genetics , Glycosyltransferases/classification , Glycosyltransferases/genetics , Polysaccharides/biosynthesis , Polysaccharides/chemistry , Animals , Glycosyltransferases/metabolism , Humans , Models, Molecular , Sialyltransferases/classification , Sialyltransferases/genetics , Sialyltransferases/metabolism
18.
J Lipid Res ; 49(1): 183-91, 2008 Jan.
Article in English | MEDLINE | ID: mdl-17921532

ABSTRACT

The repertoire of biosynthetic enzymes found in an organism is an important clue for elucidating the chemical structural variations of various compounds. In the case of fatty acids, it is essential to examine key enzymes that are desaturases and elongases, whose combination determine the range of fatty acid structures. We systematically investigated 56 eukaryotic genomes to obtain 275 desaturase and 265 elongase homologs. Phylogenetic and motif analysis indicated that the desaturases consisted of four functionally distinct subfamilies and the elongases consisted of two subfamilies. From the combination of the subfamilies, we then predicted the ability to synthesize six types of fatty acids. Consequently, we found that the ranges of synthesizable fatty acids were often different even between closely related organisms. The reason is that, as well as diverging into subfamilies, the enzymes have functionally diverged within the individual subfamilies. Finally, we discuss how the adaptation to individual environments and the ability to synthesize specific metabolites provides some explanation for the diversity of enzyme functions. This study provides an example of a potent strategy to bridge the gap from genomic knowledge to chemical knowledge.


Subject(s)
Acetyltransferases/chemistry , Fatty Acid Desaturases/chemistry , Fatty Acids, Unsaturated/biosynthesis , Acetyltransferases/classification , Acetyltransferases/genetics , Acetyltransferases/metabolism , Amino Acid Sequence , Animals , Base Sequence , Evolution, Molecular , Fatty Acid Desaturases/classification , Fatty Acid Desaturases/genetics , Fatty Acid Desaturases/metabolism , Fatty Acid Elongases , Fatty Acids, Unsaturated/chemistry , Fungi/enzymology , Genome , Humans , Molecular Sequence Data , Phylogeny , Plants/enzymology
19.
Nucleic Acids Res ; 35(Web Server issue): W182-5, 2007 Jul.
Article in English | MEDLINE | ID: mdl-17526522

ABSTRACT

The number of complete and draft genomes is rapidly growing in recent years, and it has become increasingly important to automate the identification of functional properties and biological roles of genes in these genomes. In the KEGG database, genes in complete genomes are annotated with the KEGG orthology (KO) identifiers, or the K numbers, based on the best hit information using Smith-Waterman scores as well as by the manual curation. Each K number represents an ortholog group of genes, and it is directly linked to an object in the KEGG pathway map or the BRITE functional hierarchy. Here, we have developed a web-based server called KAAS (KEGG Automatic Annotation Server: http://www.genome.jp/kegg/kaas/) i.e. an implementation of a rapid method to automatically assign K numbers to genes in the genome, enabling reconstruction of KEGG pathways and BRITE hierarchies. The method is based on sequence similarities, bi-directional best hit information and some heuristics, and has achieved a high degree of accuracy when compared with the manually curated KEGG GENES database.


Subject(s)
Chromosome Mapping/methods , Computational Biology/methods , Documentation/methods , Genome , Proteome/classification , Proteome/metabolism , Sequence Analysis/methods , Signal Transduction/physiology , Vocabulary, Controlled , Animals , Artificial Intelligence , Automation , Database Management Systems , Humans , Information Storage and Retrieval/methods , Internet
20.
Traffic ; 7(8): 1104-18, 2006 Aug.
Article in English | MEDLINE | ID: mdl-16882042

ABSTRACT

The SNARE proteins are required for membrane fusion during intracellular vesicular transport and for its specificity. Only the unique combination of SNARE proteins (cognates) can be bound and can lead to membrane fusion, although the characteristics of the possible specificity of the binding combinations encoded in the SNARE sequences have not yet been determined. We discovered by whole genome sequence analysis that sequence motifs (conserved sequences) in the SNARE motif domains for each protein group correspond to localization sites or transport pathways. We claim that these motifs reflect the specificity of the binding combinations of SNARE motif domains. Using these motifs, we could classify SNARE proteins from 48 organisms into their localization sites or transport pathways. The classification result shows that more than 10 SNARE subgroups are kingdom specific and that the SNARE paralogs involved in the plasma membrane-related transport pathways have developed greater variations in higher animals and higher plants than those involved in the endoplasmic reticulum-related transport pathways throughout eukaryotic evolution.


Subject(s)
Amino Acid Motifs , Phylogeny , SNARE Proteins/physiology , Cluster Analysis , Protein Transport , SNARE Proteins/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL
...