Pesquisa | Portal Regional da BVS

DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data.

Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu.

DNA Res ; 20(4): 383-90, 2013 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-23657089

RESUMO

High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.

Assuntos

Genômica , Anotação de Sequência Molecular/métodos , Análise de Sequência de DNA/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala , Internet

Biological databases at DNA Data Bank of Japan in the era of next-generation sequencing technologies.

Kodama, Yuichi; Kaminuma, Eli; Saruhashi, Satoshi; Ikeo, Kazuho; Sugawara, Hideaki; Tateno, Yoshio; Nakamura, Yasukazu.

Adv Exp Med Biol ; 680: 125-35, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-20865494

RESUMO

The Center for Information Biology and DNA Data Bank of Japan (CIB-DDBJ) has operated biological databases since 1987 in collaboration with NCBI and EBI. As one of the three major public databases, CIB-DDBJ has run four primary databases DDBJ, CIBEX, DDBJ Trace Archive (DTA), and DDBJ Read Archive (DRA) to collect, archive, and provide various kinds of biological data. As the massively parallel new sequencing platforms are increasingly in use, huge amounts of the raw data have been produced. To archive these raw data, we at CIB-DDBJ began operating a new repository, the DDBJ Read Archive (DRA). To accommodate efficiently the processed data as well, we have developed a new pipeline, the DDBJ Read Annotation Pipeline that deals with both data submission and analysis. For data produced by the next generation platforms, the three archives DRA, DDBJ, and CIBEX, which are interconnected by the pipeline, collect the raw, processed sequence, and quantitative data, respectively. The public biological databases at CIB-DDBJ, EBI, and NCBI will together construct world-wide archives for biological data by data sharing to accelerate research in life sciences in the era of next generation sequencing technologies.

Assuntos

Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Análise de Sequência de DNA/estatística & dados numéricos , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/tendências , Japão , Modelos Estatísticos , Análise de Sequência de DNA/tendências

Phylogenetic construction of 17 bacterial phyla by new method and carefully selected orthologs.

Horiike, Tokumasa; Miyata, Daisuke; Hamada, Kazuo; Saruhashi, Satoshi; Shinozawa, Takao; Kumar, Sudhir; Chakraborty, Ranajit; Komiyama, Tomoyoshi; Tateno, Yoshio.

Gene ; 429(1-2): 59-64, 2009 Jan 15.

Artigo em Inglês | MEDLINE | ID: mdl-19000750

RESUMO

Here, we constructed a phylogenetic tree of 17 bacterial phyla covering eubacteria and archaea by using a new method and 102 carefully selected orthologs from their genomes. One of the serious disturbing factors in phylogeny construction is the existence of out-paralogs that cannot easily be found out and discarded. In our method, out-paralogs are detected and removed by constructing a phylogenetic tree of the genes in question and examining the clustered genes in the tree. We also developed a method for comparing two tree topologies or shapes, ComTree. Applying ComTree to the constructed tree we computed the relative number of orthologs that support a node of the tree. This number is called the Positive Ortholog Ratio (POR), which is conceptually and methodologically different from the frequently used bootstrap value. Our study concretely shows drawbacks of the bootstrap test. Our result of bacterial phylogeny analysis is consistent with previous ones showing that hyperthermophilic bacteria such as Thermotogae and Aquificae diverged earlier than the others in the eubacterial phylogeny studied. It is noted that our results are consistent whether thermophilic archaea or mesophilic archaea is employed for determining the root of the tree. The earliest divergence of hyperthermophilic eubacteria is supported by genes involved in fundamental metabolic processes such as glycolysis, nucleotide and amino acid syntheses.

Assuntos

Bactérias/classificação , Bactérias/genética , Filogenia , Análise de Sequência/métodos , Homologia de Sequência do Ácido Nucleico

Comprehensive analysis of the origin of eukaryotic genomes.

Saruhashi, Satoshi; Hamada, Kazuo; Miyata, Daisuke; Horiike, Tokumasa; Shinozawa, Takao.

Genes Genet Syst ; 83(4): 285-91, 2008 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-18931454

RESUMO

There is currently no consensus on the evolutionary origin of eukaryotes. In the search of the ancestors of eukaryotes, we analyzed the phylogeny of 46 genomes, including those of 2 eukaryotes, 8 archaea, and 36 eubacteria. To avoid the effects of gene duplications, we used inparalog pairs of genes with orthologous relationships. First, we grouped these inparalogs into the functional categories of the nucleus, cytoplasm, and mitochondria. Next, we counted the sister groups of eukaryotes in prokaryotic phyla and plotted them on a standard phylogenetic tree. Finally, we used Pearson's chi-square test to estimate the origin of the genomes from specific prokaryotic ancestors. The results suggest the eukaryotic nuclear genome descends from an archaea that was neither euryarchaeota nor crenarchaeota and that the mitochondrial genome descends from alpha-proteobacteria. In contrast, genes related to the cytoplasm do not appear to originate from a specific group of prokaryotes.

Assuntos

Evolução Molecular , Genoma , Animais , Células Eucarióticas/classificação , Células Eucarióticas/metabolismo , Células Eucarióticas/fisiologia , Genoma/fisiologia , Humanos , Filogenia , Células Procarióticas/metabolismo

Determination of whole prokaryotic phylogeny by the development of a random extraction method.

Saruhashi, Satoshi; Hamada, Kazuo; Horiike, Tokumasa; Shinozawa, Takao.

Gene ; 392(1-2): 157-63, 2007 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-17275216

RESUMO

The construction of accurate prokaryotic phylogeny is important not only in the field of evolutionary biology, but also in microbiology and pathology. However, in constructing a phylogenetic tree to trace prokaryotic evolution, the phylogenetic relationship is often changed by the choice of species. For the estimation of the accurate lineage of prokaryotes, a new method, named the "random extraction method", was developed. In this method, 16S rRNA sequence data were randomly extracted 1000 times from each closely-related taxa such as seven phyla of Eubacteria and one domain of Archaea and phylogenetic trees were constructed by the data to clarify the relationship of those groups. Next, the tree topology was counted and the most supported tree topology was found as the most plausible phylogenetic tree. To evaluate the reliability of each node, we developed the "Branching rate" (BR) and calculated for every tree. And also, computational simulation analysis was carried out to confirm these methods. On the assumption that the root of life is between Archaea and Eubacteria, the obtained phylogenetic relationships of phyla are the following. At first, Archaea (Euryarchaeota, Crenarchaeota and Korarchaeota) diverged, and Thermotogales, Cyanobacteria and Chlamydiales diverged in this order, then Firmicutes (Actinobacteria and Bacillus/Clostridium group cluster) and Proteobacteria (alpha and beta/gamma cluster) diverged. In addition, it was shown by the BR that the position of the node of Firmicutes Actinobacteria and Firmicutes Bacillus/Clostridium was changeable for each extraction. Therefore, it was suggested that the differences among the phylogenetic trees of prokaryotes were caused by the influence of these phyla.

Assuntos

Coleta de Dados/métodos , Filogenia , Células Procarióticas , Simulação por Computador , Bases de Dados Genéticas , RNA Ribossômico 16S/genética

Association of an intronic polymorphism in the midkine (MK) gene with human sporadic colorectal cancer.

Ahmed, Kazi Mokim; Shitara, Yoshinori; Takenoshita, Seiichi; Kuwano, Hiroyuki; Saruhashi, Satoshi; Shinozawa, Takao.

Cancer Lett ; 180(2): 159-63, 2002 Jun 28.

Artigo em Inglês | MEDLINE | ID: mdl-12175547

RESUMO

Midkine (MK) is a heparin-binding growth factor specified by a retinoic acid responsive gene. It plays important roles in development and carcinogenesis. The MK gene is located on chromosome 11q11.2 in humans. A heterozygous G to T transition at the 62nd base in intron 3 of this gene has been identified in sporadic colorectal and gastric cancers (Int. J. Mol. Med. 6 (2000) 281). To clarify whether this polymorphism is associated with a cancer risk, a case-control study was conducted. We examined 98 colorectal, 60 gastric, 59 esophagus, 32 lung and 37 breast cancer tissue specimens and their corresponding non-neoplastic tissues. Also, 86 unaffected control specimens were examined. The G/T genotype frequency in colorectal cancers was higher than that in normal samples (11.2 versus 2.3%; P=0.017). Therefore, this genotype could represent a risk factor for tumorigenesis in the colon and rectum of Japanese.

Assuntos

Proteínas de Transporte/genética , Neoplasias Colorretais/genética , Citocinas , Íntrons , Neoplasias Colorretais/etiologia , Feminino , Humanos , Masculino , Midkina , Polimorfismo Genético , Fatores de Risco

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA