Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Article in English | MEDLINE | ID: mdl-38862433

ABSTRACT

During the last decade, the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges, including access to human data, as well as transfer, storage, and sharing of enormous amounts of data. To promote data-driven biological research, the Korean government announced that all biological data generated from government-funded research projects should be deposited at the Korea BioData Station (K-BDS), which consists of multiple databases for individual data types. Here, we introduce the Korean Nucleotide Archive (KoNA), a repository of nucleotide sequence data. As of July 2022, the Korean Read Archive in KoNA has collected over 477 TB of raw next-generation sequencing data from national genome projects. To ensure data quality and prepare for international alignment, a standard operating procedure was adopted, which is similar to that of the International Nucleotide Sequence Database Collaboration. The standard operating procedure includes quality control processes for submitted data and metadata using an automated pipeline, followed by manual examination. To ensure fast and stable data transfer, a high-speed transmission system called GBox is used in KoNA. Furthermore, the data uploaded to or downloaded from KoNA through GBox can be readily processed using a cloud computing service called Bio-Express. This seamless coupling of KoNA, GBox, and Bio-Express enhances the data experience, including submission, access, and analysis of raw nucleotide sequences. KoNA not only satisfies the unmet needs for a national sequence repository in Korea but also provides datasets to researchers globally and contributes to advances in genomics. The KoNA is available at https://www.kobic.re.kr/kona/.


Subject(s)
Databases, Nucleic Acid , Republic of Korea , Humans , High-Throughput Nucleotide Sequencing/methods
2.
PLoS One ; 15(10): e0240191, 2020.
Article in English | MEDLINE | ID: mdl-33112870

ABSTRACT

Functional analyses of genes are crucial for unveiling biological responses, genetic engineering, and developing new medicines. However, functional analyses have largely been restricted to model organisms, representing a major hurdle for functional studies and industrial applications. To resolve this, comparative genome analyses can be used to provide clues to gene functions as well as their evolutionary history. To this end, we present Prometheus, a web-based omics portal that contains more than 17,215 sequences from prokaryotic and eukaryotic genomes. This portal supports interkingdom comparative analyses via a domain architecture-based gene identification system and Gene Search, and users can easily and rapidly identify single or entire gene sets in specific pathways. Bioinformatics tools for further analyses are provided in Prometheus or through Bio-Express, a cloud-based bioinformatics analysis platform. Prometheus is a new paradigm for comparative analyses of large amounts of genomic information.


Subject(s)
Genomics/methods , Software , Animals , Archaea/genetics , Bacteria/genetics , Fungi/genetics , Humans , Metabolomics/methods , Plants/genetics , Sequence Alignment/methods
3.
Nat Genet ; 52(6): 594-603, 2020 06.
Article in English | MEDLINE | ID: mdl-32451460

ABSTRACT

Immunotherapy for metastatic colorectal cancer is effective only for mismatch repair-deficient tumors with high microsatellite instability that demonstrate immune infiltration, suggesting that tumor cells can determine their immune microenvironment. To understand this cross-talk, we analyzed the transcriptome of 91,103 unsorted single cells from 23 Korean and 6 Belgian patients. Cancer cells displayed transcriptional features reminiscent of normal differentiation programs, and genetic alterations that apparently fostered immunosuppressive microenvironments directed by regulatory T cells, myofibroblasts and myeloid cells. Intercellular network reconstruction supported the association between cancer cell signatures and specific stromal or immune cell populations. Our collective view of the cellular landscape and intercellular interactions in colorectal cancer provide mechanistic information for the design of efficient immuno-oncology treatment strategies.


Subject(s)
Cell Lineage , Colorectal Neoplasms/genetics , Colorectal Neoplasms/immunology , Gene Expression Regulation, Neoplastic/immunology , Colorectal Neoplasms/pathology , Gastric Mucosa/immunology , Gastric Mucosa/pathology , Humans , Sequence Analysis, RNA , Single-Cell Analysis , Stromal Cells/pathology , T-Lymphocytes/immunology , T-Lymphocytes/pathology , Tumor Microenvironment/genetics , Tumor Microenvironment/immunology
4.
Genomics Inform ; 18(1): e8, 2020 Mar.
Article in English | MEDLINE | ID: mdl-32224841

ABSTRACT

The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www.bioexpress.re.kr/.

5.
Bioinformatics ; 36(5): 1584-1589, 2020 03 01.
Article in English | MEDLINE | ID: mdl-31599923

ABSTRACT

MOTIVATION: Owing to advanced DNA sequencing and genome assembly technology, the number of species with sequenced genomes is rapidly increasing. The aim of the recently launched Earth BioGenome Project is to sequence genomes of all eukaryotic species on Earth over the next 10 years, making it feasible to obtain genomic blueprints of the majority of animal and plant species by this time. Genetic models of the sequenced species will later be subject to functional annotation, and a comprehensive molecular network should facilitate functional analysis of individual genes and pathways. However, network databases are lagging behind genome sequencing projects as even the largest network database provides gene networks for less than 10% of sequenced eukaryotic genomes, and the knowledge gap between genomes and interactomes continues to widen. RESULTS: We present BiomeNet, a database of 95 scored networks comprising over 8 million co-functional links, which can build and analyze gene networks for any species with the sequenced genome. BiomeNet transfers functional interactions between orthologous proteins from source networks to the target species within minutes and automatically constructs gene networks with the quality comparable to that of existing networks. BiomeNet enables assembly of the first-in-species gene networks not available through other databases, which are highly predictive of diverse biological processes and can also provide network analysis by extracting subnetworks for individual biological processes and network-based gene prioritizations. These data indicate that BiomeNet could enhance the benefits of decoding the genomes of various species, thus improving our understanding of the Earth' biodiversity. AVAILABILITY AND IMPLEMENTATION: The BiomeNet is freely available at http://kobic.re.kr/biomenet/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Databases, Genetic , Genome , Animals , Gene Regulatory Networks , Genomics , Sequence Analysis, DNA
6.
BMC Bioinformatics ; 19(Suppl 1): 43, 2018 02 19.
Article in English | MEDLINE | ID: mdl-29504905

ABSTRACT

BACKGROUND: While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. RESULTS: To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. CONCLUSIONS: Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/ .


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Software , Algorithms , Cloud Computing , Genomics/methods , Workflow
7.
DNA Res ; 24(1): 71-80, 2017 Feb 01.
Article in English | MEDLINE | ID: mdl-28011721

ABSTRACT

Hibiscus syriacus (L.) (rose of Sharon) is one of the most widespread garden shrubs in the world. We report a draft of the H. syriacus genome comprised of a 1.75 Gb assembly that covers 92% of the genome with only 1.7% (33 Mb) gap sequences. Predicted gene modeling detected 87,603 genes, mostly supported by deep RNA sequencing data. To define gene family distribution among relatives of H. syriacus, orthologous gene sets containing 164,660 genes in 21,472 clusters were identified by OrthoMCL analysis of five plant species, including H. syriacus, Arabidopsis thaliana, Gossypium raimondii, Theobroma cacao and Amborella trichopoda. We inferred their evolutionary relationships based on divergence times among Malvaceae plant genes and found that gene families involved in flowering regulation and disease resistance were more highly divergent and expanded in H. syriacus than in its close relatives, G. raimondii (DD) and T. cacao. Clustered gene families and gene collinearity analysis revealed that two recent rounds of whole-genome duplication were followed by diploidization of the H. syriacus genome after speciation. Copy number variation and phylogenetic divergence indicates that WGDs and subsequent diploidization led to unequal duplication and deletion of flowering-related genes in H. syriacus and may affect its unique floral morphology.


Subject(s)
Flowers/growth & development , Genome, Plant , Hibiscus/genetics , Polyploidy , DNA-Binding Proteins/genetics , Hibiscus/physiology , Multigene Family , RNA-Binding Proteins/genetics , Transcriptome
8.
Nucleic Acids Res ; 39(Database issue): D939-44, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21051351

ABSTRACT

Numerous genetic variations have been found to be related to human diseases. Significant portion of those affect the drug response as well by changing the protein structure and function. Therefore, it is crucial to understand the trilateral relationship among genomic variations, diseases and drugs. We present the variations and drugs (VnD), a consolidated database containing information on diseases, related genes and genetic variations, protein structures and drug information. VnD was built in three steps. First, we integrated various resources systematically to deduce catalogs of disease-related genes, single nucleotide polymorphisms (SNPs), protein mutations and relevant drugs. VnD contains 137,195 disease-related gene records (13,940 distinct genes) and 16,586 genetic variation records (1790 distinct variations). Next, we carried out structure modeling and docking simulation for wild-type and mutant proteins to examine the structural and functional consequences of non-synonymous SNPs in the drug-related genes. Conformational changes in 590 wild-type and 4437 mutant proteins from drug-related genes were included in our database. Finally, we investigated the structural and biochemical properties relevant to drug binding such as the distribution of SNPs in proximal protein pockets, thermo-chemical stability, interactions with drugs and physico-chemical properties. The VnD database, available at http://vnd.kobic.re.kr:8080/VnD/ or vandd.org, would be a useful platform for researchers studying the underlying mechanism for association among genetic variations, diseases and drugs.


Subject(s)
Databases, Genetic , Disease/genetics , Pharmaceutical Preparations/chemistry , Polymorphism, Single Nucleotide , Protein Conformation , Proteins/genetics , Humans , Mutation , Proteins/chemistry , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...