Search | VHL Regional Portal

DDBJ update in 2023: the MetaboBank for metabolomics data and associated metadata.

Ara, Takeshi; Kodama, Yuichi; Tokimatsu, Toshiaki; Fukuda, Asami; Kosuge, Takehide; Mashima, Jun; Tanizawa, Yasuhiro; Tanjo, Tomoya; Ogasawara, Osamu; Fujisawa, Takatomo; Nakamura, Yasukazu; Arita, Masanori.

Nucleic Acids Res ; 52(D1): D67-D71, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37971299

ABSTRACT

The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) provides database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), DDBJ accepts and distributes nucleotide sequence data as well as their study and sample information along with the National Center for Biotechnology Information in the United States and the European Bioinformatics Institute (EBI). Besides INSDC databases, the DDBJ Center provides databases for functional genomics (GEA: Genomic Expression Archive), metabolomics (MetaboBank) and human genetic and phenotypic data (JGA: Japanese Genotype-phenotype Archive). These database systems have been built on the National Institute of Genetics (NIG) supercomputer, which is also open for domestic life science researchers to analyze large-scale sequence data. This paper reports recent updates on the archival databases and the services of the DDBJ Center, highlighting the newly redesigned MetaboBank. MetaboBank uses BioProject and BioSample in its metadata description making it suitable for multi-omics large studies. Its collaboration with MetaboLights at EBI brings synergy in locating and reusing public data.

Subject(s)

Databases, Nucleic Acid , Metabolomics , Metadata , Humans , Computational Biology , Genomics , Internet , Japan , Multiomics/methods

DNA Data Bank of Japan (DDBJ) update report 2022.

Tanizawa, Yasuhiro; Fujisawa, Takatomo; Kodama, Yuichi; Kosuge, Takehide; Mashima, Jun; Tanjo, Tomoya; Nakamura, Yasukazu.

Nucleic Acids Res ; 51(D1): D101-D105, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36420889

ABSTRACT

The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) maintains database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), our primary mission is to collect and distribute nucleotide sequence data, as well as their study and sample information, in collaboration with the National Center for Biotechnology Information in the United States and the European Bioinformatics Institute. In addition to INSDC resources, the Center operates databases for functional genomics (GEA: Genomic Expression Archive), metabolomics (MetaboBank), and human genetic and phenotypic data (JGA: Japanese Genotype-Phenotype Archive). These databases are built on the supercomputer of the National Institute of Genetics, whose remaining computational capacity is actively utilized by domestic researchers for large-scale biological data analyses. Here, we report our recent updates and the activities of our services.

Subject(s)

Databases, Nucleic Acid , Genomics , Humans , United States , Computational Biology , Computers , Base Sequence , Japan , Internet

Cloud service checklist for academic communities and customization for genome medical research.

Kobayashi, Kumiko; Yoshida, Hiroshi; Tanjo, Tomoya; Aida, Kento.

Hum Genome Var ; 9(1): 36, 2022 Oct 17.

Article in English | MEDLINE | ID: mdl-36253343

ABSTRACT

In this paper, we present a cloud service checklist designed to help IT administrators or researchers in academic organizations select the most suitable cloud services. This checklist, which comprises items that we believe IT administrators or researchers in academic organizations should consider when they adopt cloud services, comprehensively covers the issues related to a variety of cloud services, including security, functionality, performance, and law. In response to the increasing demands for storage and computing resources in genome medical science communities, various guidelines for using resources operated by external organizations, such as cloud services, have been published by different academic funding agencies and the Japanese government. However, it is sometimes difficult to identify the checklist items that satisfy the genome medical science community's guidelines, and some of these requirements are not included in the existing checklists. This issue provided our motivation for creating a cloud service checklist customized for genome medical research communities. The resulting customized checklist is designed to help researchers easily find information about the cloud services that satisfy the guidelines in genome medical science communities. Additionally, we explore whether many cloud service providers satisfy the requirements or checklist items in the cloud service checklist for genome medical research by evaluating their survey responses.

Practical guide for managing large-scale human genome data in research.

Tanjo, Tomoya; Kawai, Yosuke; Tokunaga, Katsushi; Ogasawara, Osamu; Nagasaki, Masao.

J Hum Genet ; 66(1): 39-52, 2021 Jan.

Article in English | MEDLINE | ID: mdl-33097812

ABSTRACT

Studies in human genetics deal with a plethora of human genome sequencing data that are generated from specimens as well as available on public domains. With the development of various bioinformatics applications, maintaining the productivity of research, managing human genome data, and analyzing downstream data is essential. This review aims to guide struggling researchers to process and analyze these large-scale genomic data to extract relevant information for improved downstream analyses. Here, we discuss worldwide human genome projects that could be integrated into any data for improved analysis. Obtaining human whole-genome sequencing data from both data stores and processes is costly; therefore, we focus on the development of data format and software that manipulate whole-genome sequencing. Once the sequencing is complete and its format and data processing tools are selected, a computational platform is required. For the platform, we describe a multi-cloud strategy that balances between cost, performance, and customizability. A good quality published research relies on data reproducibility to ensure quality results, reusability for applications to other datasets, as well as scalability for the future increase of datasets. To solve these, we describe several key technologies developed in computer science, including workflow engine. We also discuss the ethical guidelines inevitable for human genomic data analysis that differ from model organisms. Finally, the future ideal perspective of data processing and analysis is summarized.

Subject(s)

Computational Biology/methods , Genome, Human/genetics , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Human Genome Project , Whole Genome Sequencing/methods , Humans , Information Storage and Retrieval/methods , Reproducibility of Results , Software

Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection.

Ohta, Tazro; Tanjo, Tomoya; Ogasawara, Osamu.

Gigascience ; 8(4)2019 04 01.

Article in English | MEDLINE | ID: mdl-31222199

ABSTRACT

BACKGROUND: Container virtualization technologies such as Docker are popular in the bioinformatics domain because they improve the portability and reproducibility of software deployment. Along with software packaged in containers, the standardized workflow descriptors Common Workflow Language (CWL) enable data to be easily analyzed on multiple computing environments. These technologies accelerate the use of on-demand cloud computing platforms, which can be scaled according to the quantity of data. However, to optimize the time and budgetary restraints of cloud usage, users must select a suitable instance type that corresponds to the resource requirements of their workflows. RESULTS: We developed CWL-metrics, a utility tool for cwltool (the reference implementation of CWL), to collect runtime metrics of Docker containers and workflow metadata to analyze workflow resource requirements. To demonstrate the use of this tool, we analyzed 7 transcriptome quantification workflows on 6 instance types. The results revealed that choice of instance type can deliver lower financial costs and faster execution times using the required amount of computational resources. CONCLUSIONS: CWL-metrics can generate a summary of resource requirements for workflow executions, which can help users to optimize their use of cloud computing by selecting appropriate instances. The runtime metrics data generated by CWL-metrics can also help users to share workflows between different workflow management frameworks.

Subject(s)

Cloud Computing , Computational Biology/methods , Genomics/methods , Software , High-Throughput Nucleotide Sequencing , Workflow

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL