Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 114
Filter
1.
Front Plant Sci ; 15: 1371222, 2024.
Article in English | MEDLINE | ID: mdl-38567138

ABSTRACT

Pan-genome studies are important for understanding plant evolution and guiding the breeding of crops by containing all genomic diversity of a certain species. Three short-read-based strategies for plant pan-genome construction include iterative individual, iteration pooling, and map-to-pan. Their performance is very different under various conditions, while comprehensive evaluations have yet to be conducted nowadays. Here, we evaluate the performance of these three pan-genome construction strategies for plants under different sequencing depths and sample sizes. Also, we indicate the influence of length and repeat content percentage of novel sequences on three pan-genome construction strategies. Besides, we compare the computational resource consumption among the three strategies. Our findings indicate that map-to-pan has the greatest recall but the lowest precision. In contrast, both two iterative strategies have superior precision but lower recall. Factors of sample numbers, novel sequence length, and the percentage of novel sequences' repeat content adversely affect the performance of all three strategies. Increased sequencing depth improves map-to-pan's performance, while not affecting the other two iterative strategies. For computational resource consumption, map-to-pan demands considerably more than the other two iterative strategies. Overall, the iterative strategy, especially the iterative pooling strategy, is optimal when the sequencing depth is less than 20X. Map-to-pan is preferable when the sequencing depth exceeds 20X despite its higher computational resource consumption.

2.
aBIOTECH ; 5(1): 94-106, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38576435

ABSTRACT

Genomic data serve as an invaluable resource for unraveling the intricacies of the higher plant systems, including the constituent elements within and among species. Through various efforts in genomic data archiving, integrative analysis and value-added curation, the National Genomics Data Center (NGDC), which is a part of the China National Center for Bioinformation (CNCB), has successfully established and currently maintains a vast amount of database resources. This dedicated initiative of the NGDC facilitates a data-rich ecosystem that greatly strengthens and supports genomic research efforts. Here, we present a comprehensive overview of central repositories dedicated to archiving, presenting, and sharing plant omics data, introduce knowledgebases focused on variants or gene-based functional insights, highlight species-specific multiple omics database resources, and briefly review the online application tools. We intend that this review can be used as a guide map for plant researchers wishing to select effective data resources from the NGDC for their specific areas of study. Supplementary Information: The online version contains supplementary material available at 10.1007/s42994-023-00134-4.

3.
Curr Microbiol ; 81(5): 122, 2024 Mar 26.
Article in English | MEDLINE | ID: mdl-38530471

ABSTRACT

The chromosome structure of different bacteria has its unique organization pattern, which plays an important role in maintaining the spatial location relationship between genes and regulating gene expression. Conversely, transcription also plays a global role in regulating the three-dimensional structure of bacterial chromosomes. Therefore, we combine RNA-Seq and Hi-C technology to explore the relationship between chromosome structure changes and transcriptional regulation in E. coli at different growth stages. Transcriptome analysis indicates that E. coli synthesizes many ribosomes and peptidoglycan in the exponential phase. In contrast, E. coli undergoes more transcriptional regulation and catabolism during the stationary phase, reflecting its adaptability to changes in environmental conditions during growth. Analyzing the Hi-C data shows that E. coli has a higher frequency of global chromosomal interaction in the exponential phase and more defined chromosomal interaction domains (CIDs). Still, the long-distance interactions at the replication termination region are lower than in the stationary phase. Combining transcriptome and Hi-C data analysis, we conclude that highly expressed genes are more likely to be distributed in CID boundary regions during the exponential phase. At the same time, most high-expression genes distributed in the CID boundary regions are ribosomal gene clusters, forming clearer CID boundaries during the exponential phase. The three-dimensional structure of chromosome and expression pattern is altered during the growth of E. coli from the exponential phase to the stationary phase, clarifying the synergy between the two regulatory aspects.


Subject(s)
Escherichia coli Proteins , Escherichia coli , Escherichia coli/genetics , Escherichia coli Proteins/genetics , Transcriptome , Chromosomes, Bacterial/metabolism , Chromosome Structures/metabolism , Gene Expression Regulation, Bacterial
4.
Nucleic Acids Res ; 52(D1): D1315-D1326, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37870452

ABSTRACT

Human endogenous retroviruses (HERVs), as remnants of ancient exogenous retrovirus infected and integrated into germ cells, comprise ∼8% of the human genome. These HERVs have been implicated in numerous diseases, and extensive research has been conducted to uncover their specific roles. Despite these efforts, a comprehensive source of HERV-disease association still needs to be added. To address this gap, we introduce the HervD Atlas (https://ngdc.cncb.ac.cn/hervd/), an integrated knowledgebase of HERV-disease associations manually curated from all related published literature. In the current version, HervD Atlas collects 60 726 HERV-disease associations from 254 publications (out of 4692 screened literature), covering 21 790 HERVs (21 049 HERV-Terms and 741 HERV-Elements) belonging to six types, 149 diseases and 610 related/affected genes. Notably, an interactive knowledge graph that systematically integrates all the HERV-disease associations and corresponding affected genes into a comprehensive network provides a powerful tool to uncover and deduce the complex interplay between HERVs and diseases. The HervD Atlas also features a user-friendly web interface that allows efficient browsing, searching, and downloading of all association information, research metadata, and annotation information. Overall, the HervD Atlas is an essential resource for comprehensive, up-to-date knowledge on HERV-disease research, potentially facilitating the development of novel HERV-associated diagnostic and therapeutic strategies.


Subject(s)
Endogenous Retroviruses , Knowledge Bases , Virus Diseases , Humans , Virus Diseases/genetics , Virus Diseases/virology , Atlases as Topic , Internet Use
5.
Nucleic Acids Res ; 52(D1): D1651-D1660, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37843152

ABSTRACT

Tropical crops are vital for tropical agriculture, with resource scarcity, functional diversity and extensive market demand, providing considerable economic benefits for the world's tropical agriculture-producing countries. The rapid development of sequencing technology has promoted a milestone in tropical crop research, resulting in the generation of massive amount of data, which urgently needs an effective platform for data integration and sharing. However, the existing databases cannot fully satisfy researchers' requirements due to the relatively limited integration level and untimely update. Here, we present the Tropical Crop Omics Database (TCOD, https://ngdc.cncb.ac.cn/tcod), a comprehensive multi-omics data platform for tropical crops. TCOD integrates diverse omics data from 15 species, encompassing 34 chromosome-level de novo assemblies, 1 255 004 genes with functional annotations, 282 436 992 unique variants from 2048 WGS samples, 88 transcriptomic profiles from 1997 RNA-Seq samples and 13 381 germplasm items. Additionally, TCOD not only employs genes as a bridge to interconnect multi-omics data, enabling cross-species comparisons based on homology relationships, but also offers user-friendly online tools for efficient data mining and visualization. In short, TCOD integrates multi-species, multi-omics data and online tools, which will facilitate the research on genomic selective breeding and trait biology of tropical crops.


Subject(s)
Crops, Agricultural , Databases, Genetic , Crops, Agricultural/genetics , Transcriptome , Genome, Plant
6.
Sci Bull (Beijing) ; 68(22): 2806-2816, 2023 11 30.
Article in English | MEDLINE | ID: mdl-37919157

ABSTRACT

It is difficult to infer causality from high-dimension metagenomic data due to interference from numerous confounders. By imitating the twin studies in genetic research, we develop a straightforward method-virtual twins (VTwins)-to eliminate the confounder effects by transforming the original cohort into a paired cohort of "Twin" samples with distinct phenotypes but matched taxonomic profiles. The results show that VTwins outperforms the conventional approach in the sensitivity of identifying causative features and only requires a 10-fold reduced sample size for recalling disease-associated microbes or pathways, as tested by simulated and empirical data. Benchmark test with other 16 kinds of software further validates the power and applicability of VTwins for handling high-dimension compositional datasets and mining causalities in metagenomic research. In conclusion, VTwins is straightforward and effective in handling high-diversity, high-dimension compositional data, promising applications in mining causalities for metagenomic and potentially other omics data. VTwins is open access and available at https://github.com/mengqingren/VTwins.


Subject(s)
Algorithms , Metagenome , Humans , Metagenome/genetics , Software , Metagenomics/methods
7.
Comput Struct Biotechnol J ; 21: 4675-4682, 2023.
Article in English | MEDLINE | ID: mdl-37841327

ABSTRACT

Cancer cell lines are essential in cancer research, yet accurate authentication of these cell lines can be challenging, particularly for consanguineous cell lines with close genetic similarities. We introduce a new Cancer Cell Line Hunter (CCLHunter) method to tackle this challenge. This approach utilizes the information of single nucleotide polymorphisms, expression profiles, and kindred topology to authenticate 1389 human cancer cell lines accurately. CCLHunter can precisely and efficiently authenticate cell lines from consanguineous lineages and those derived from other tissues of the same individual. Our evaluation results indicate that CCLHunter has a complete accuracy rate of 93.27%, with an accuracy of 89.28% even for consanguineous cell lines, outperforming existing methods. Additionally, we provide convenient access to CCLHunter through standalone software and a web server at https://ngdc.cncb.ac.cn/cclhunter.

8.
Commun Biol ; 6(1): 899, 2023 09 01.
Article in English | MEDLINE | ID: mdl-37658226

ABSTRACT

Genome-wide association study has identified fruitful variants impacting heritable traits. Nevertheless, identifying critical genes underlying those significant variants has been a great task. Transcriptome-wide association study (TWAS) is an instrumental post-analysis to detect significant gene-trait associations focusing on modeling transcription-level regulations, which has made numerous progresses in recent years. Leveraging from expression quantitative loci (eQTL) regulation information, TWAS has advantages in detecting functioning genes regulated by disease-associated variants, thus providing insight into mechanisms of diseases and other phenotypes. Considering its vast potential, this review article comprehensively summarizes TWAS, including the methodology, applications and available resources.


Subject(s)
Genome-Wide Association Study , Transcriptome , Databases, Factual , Fruit , Phenotype
9.
Article in English | MEDLINE | ID: mdl-37742994
10.
Nat Aging ; 3(6): 705-721, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37118553

ABSTRACT

How N6-methyladenosine (m6A), the most abundant mRNA modification, contributes to primate tissue homeostasis and physiological aging remains elusive. Here, we characterize the m6A epitranscriptome across the liver, heart and skeletal muscle in young and old nonhuman primates. Our data reveal a positive correlation between m6A modifications and gene expression homeostasis across tissues as well as tissue-type-specific aging-associated m6A dynamics. Among these tissues, skeletal muscle is the most susceptible to m6A loss in aging and shows a reduction in the m6A methyltransferase METTL3. We further show that METTL3 deficiency in human pluripotent stem cell-derived myotubes leads to senescence and apoptosis, and identify NPNT as a key element downstream of METTL3 involved in myotube homeostasis, whose expression and m6A levels are both decreased in senescent myotubes. Our study provides a resource for elucidating m6A-mediated mechanisms of tissue aging and reveals a METTL3-m6A-NPNT axis counteracting aging-associated skeletal muscle degeneration.


Subject(s)
Liver , Primates , Animals , Humans , Primates/genetics , Aging/genetics , Homeostasis/genetics , Methyltransferases/genetics
11.
Nucleic Acids Res ; 51(D1): D853-D860, 2023 Jan 06.
Article in English | MEDLINE | ID: mdl-36161321

ABSTRACT

Single-cell studies have delineated cellular diversity and uncovered increasing numbers of previously uncharacterized cell types in complex tissues. Thus, synthesizing growing knowledge of cellular characteristics is critical for dissecting cellular heterogeneity, developmental processes and tumorigenesis at single-cell resolution. Here, we present Cell Taxonomy (https://ngdc.cncb.ac.cn/celltaxonomy), a comprehensive and curated repository of cell types and associated cell markers encompassing a wide range of species, tissues and conditions. Combined with literature curation and data integration, the current version of Cell Taxonomy establishes a well-structured taxonomy for 3,143 cell types and houses a comprehensive collection of 26,613 associated cell markers in 257 conditions and 387 tissues across 34 species. Based on 4,299 publications and single-cell transcriptomic profiles of ∼3.5 million cells, Cell Taxonomy features multifaceted characterization for cell types and cell markers, involving quality assessment of cell markers and cell clusters, cross-species comparison, cell composition of tissues and cellular similarity based on markers. Taken together, Cell Taxonomy represents a fundamentally useful reference to systematically and accurately characterize cell types and thus lays an important foundation for deeply understanding and exploring cellular biology in diverse species.

12.
Nucleic Acids Res ; 51(D1): D186-D191, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36330950

ABSTRACT

LncBook, a comprehensive resource of human long non-coding RNAs (lncRNAs), has been used in a wide range of lncRNA studies across various biological contexts. Here, we present LncBook 2.0 (https://ngdc.cncb.ac.cn/lncbook), with significant updates and enhancements as follows: (i) incorporation of 119 722 new transcripts, 9632 new genes, and gene structure update of 21 305 lncRNAs; (ii) characterization of conservation features of human lncRNA genes across 40 vertebrates; (iii) integration of lncRNA-encoded small proteins; (iv) enrichment of expression and DNA methylation profiles with more biological contexts and (v) identification of lncRNA-protein interactions and improved prediction of lncRNA-miRNA interactions. Collectively, LncBook 2.0 accommodates a high-quality collection of 95 243 lncRNA genes and 323 950 transcripts and incorporates their abundant annotations at different omics levels, thereby enabling users to decipher functional significance of lncRNAs in different biological contexts.


Subject(s)
Molecular Sequence Annotation , Multiomics , RNA, Long Noncoding , Animals , Humans , MicroRNAs/genetics , RNA, Long Noncoding/metabolism
13.
Nucleic Acids Res ; 51(D1): D994-D1002, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36318261

ABSTRACT

Homology is fundamental to infer genes' evolutionary processes and relationships with shared ancestry. Existing homolog gene resources vary in terms of inferring methods, homologous relationship and identifiers, posing inevitable difficulties for choosing and mapping homology results from one to another. Here, we present HGD (Homologous Gene Database, https://ngdc.cncb.ac.cn/hgd), a comprehensive homologs resource integrating multi-species, multi-resources and multi-omics, as a complement to existing resources providing public and one-stop data service. Currently, HGD houses a total of 112 383 644 homologous pairs for 37 species, including 19 animals, 16 plants and 2 microorganisms. Meanwhile, HGD integrates various annotations from public resources, including 16 909 homologs with traits, 276 670 homologs with variants, 398 573 homologs with expression and 536 852 homologs with gene ontology (GO) annotations. HGD provides a wide range of omics gene function annotations to help users gain a deeper understanding of gene function.


Subject(s)
Databases, Genetic , Animals , Molecular Sequence Annotation
14.
Nucleic Acids Res ; 51(D1): D767-D776, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36169225

ABSTRACT

Compared with conventional comparative genomics, the recent studies in pan-genomics have provided further insights into species genomic dynamics, taxonomy and identification, pathogenicity and environmental adaptation. To better understand genome characteristics of species of interest and to fully excavate key metabolic and resistant genes and their conservations and variations, here we present ProPan (https://ngdc.cncb.ac.cn/propan), a public database covering 23 archaeal species and 1,481 bacterial species (in a total of 51,882 strains) for comprehensively profiling prokaryotic pan-genome dynamics. By analyzing and integrating these massive datasets, ProPan offers three major aspects for the pan-genome dynamics of the species of interest: 1) the evaluations of various species' characteristics and composition in pan-genome dynamics; 2) the visualization of map association, the functional annotation and presence/absence variation for all contained species' gene clusters; 3) the typical characteristics of the environmental adaptation, including resistance genes prediction of 126 substances (biocide, antimicrobial drug and metal) and evaluation of 31 metabolic cycle processes. Besides, ProPan develops a very user-friendly interface, flexible retrieval and multi-level real-time statistical visualization. Taken together, ProPan will serve as a weighty resource for the studies of prokaryotic pan-genome dynamics, taxonomy and identification as well as environmental adaptation.


Subject(s)
Databases, Genetic , Genome , Prokaryotic Cells , Archaea/genetics , Bacteria/genetics , Genome, Bacterial , Genomics
15.
Nucleic Acids Res ; 51(D1): D1179-D1187, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36243959

ABSTRACT

Transcriptome-wide association studies (TWASs), as a practical and prevalent approach for detecting the associations between genetically regulated genes and traits, are now leading to a better understanding of the complex mechanisms of genetic variants in regulating various diseases and traits. Despite the ever-increasing TWAS outputs, there is still a lack of databases curating massive public TWAS information and knowledge. To fill this gap, here we present TWAS Atlas (https://ngdc.cncb.ac.cn/twas/), an integrated knowledgebase of TWAS findings manually curated from extensive literature. In the current implementation, TWAS Atlas collects 401,266 high-quality human gene-trait associations from 200 publications, covering 22,247 genes and 257 traits across 135 tissue types. In particular, an interactive knowledge graph of the collected gene-trait associations is constructed together with single nucleotide polymorphism (SNP)-gene associations to build up comprehensive regulatory networks at multi-omics levels. In addition, TWAS Atlas, as a user-friendly web interface, efficiently enables users to browse, search and download all association information, relevant research metadata and annotation information of interest. Taken together, TWAS Atlas is of great value for promoting the utility and availability of TWAS results in explaining the complex genetic basis as well as providing new insights for human health and disease research.


Subject(s)
Quantitative Trait Loci , Transcriptome , Humans , Transcriptome/genetics , Genome-Wide Association Study/methods , Phenotype , Knowledge Bases , Polymorphism, Single Nucleotide , Genetic Predisposition to Disease
16.
Article in English | MEDLINE | ID: mdl-36572336

ABSTRACT

Biological databases serve as a global fundamental infrastructure for the worldwide scientific community, which dramatically aid the transformation of big data into knowledge discovery and drive significant innovations in a wide range of research fields. Given the rapid data production, biological databases continue to increase in size and importance. To build a catalog of worldwide biological databases, therefore, we curate a total of 5825 biological databases from 8931 publications, which are geographically distributed in 72 countries/regions and developed by 1975 institutions (as of September 20, 2022). We further devise a z-index, a novel index to characterize the scientific impact of a database, and rank all these biological databases as well as their hosting institutions and countries in terms of citation and z-index. Consequently, we present a series of statistics and trends of worldwide biological databases, yielding a global perspective to better understand their status and impact for life and health sciences. An up-to-date catalog of worldwide biological databases as well as their curated meta-information and derived statistics is publicly available at Database Commons (https://ngdc.cncb.ac.cn/databasecommons/).

17.
Biology (Basel) ; 11(7)2022 Jul 05.
Article in English | MEDLINE | ID: mdl-36101391

ABSTRACT

Erysipelothrix rhusiopathiae is a causative agent of erysipelas in animals and erysipeloid in humans. However, current information regarding E. rhusiopathiae pathogenesis remains limited. Previously, we identified two E. rhusiopathiae strains, SE38 and G4T10, which were virulent and avirulent in pigs, respectively. Here, to further study the pathogenic mechanism of E. rhusiopathiae, we sequenced and assembled the genomes of strains SE38 and G4T10, and performed a comparative genomic analysis to identify differences or mutations in virulence-associated genes. Next, we comparatively analyzed 25 E. rhusiopathiae virulence-associated genes in SE38 and G4T10. Compared with that of SE38, the spaA gene of the G4T10 strain lacked 120 bp, encoding repeat units at the C-terminal of SpaA. To examine whether these deletions or splits influence E. rhusiopathiae virulence, these 120 bp were successfully deleted from the spaA gene in strain SE38 by homologous recombination. The mutant strain ΔspaA displayed attenuated virulence in mice and decreased adhesion to porcine iliac artery endothelial cells, which was also observed using the corresponding mutant protein SpaA'. Our results demonstrate that SpaA-mediated adhesion between E. rhusiopathiae and host cells is dependent on its C-terminal repeat units.

18.
Brief Bioinform ; 23(5)2022 09 20.
Article in English | MEDLINE | ID: mdl-36088550

ABSTRACT

Somatic variants act as critical players during cancer occurrence and development. Thus, an accurate and robust method to identify them is the foundation of cutting-edge cancer genome research. However, due to low accessibility and high individual-/sample-specificity of the somatic variants in tumor samples, the detection is, to date, still crammed with challenges, particularly when lacking paired normal samples as control. To solve this burning issue, we developed a tumor-only somatic and germline variant identification method (TSomVar) using the random forest algorithm established on sample-specific variant datasets derived from genotype imputation, reads-mapping level annotation and functional annotation. We trained TSomVar by using genomic variant datasets of three major cancer types: colorectal cancer, hepatocellular carcinoma and skin cutaneous melanoma. Compared with existing tumor-only somatic variant identification tools, TSomVar shows excellent performances in somatic variant detection with higher accuracy and better capability of recalling for test datasets from colorectal cancer and skin cutaneous melanoma. In addition, TSomVar is equipped with the competence of accurately identifying germline variants in tumor samples. Taken together, TSomVar will undoubtedly facilitate and revolutionize somatic variant explorations in cancer research.


Subject(s)
Colorectal Neoplasms , Melanoma , Neoplasms , Skin Neoplasms , High-Throughput Nucleotide Sequencing/methods , Humans , Melanoma/genetics , Neoplasms/genetics , Skin Neoplasms/genetics , Melanoma, Cutaneous Malignant
19.
Front Genet ; 13: 956781, 2022.
Article in English | MEDLINE | ID: mdl-36035123

ABSTRACT

Due to the explosion of cancer genome data and the urgent needs for cancer treatment, it is becoming increasingly important and necessary to easily and timely analyze and annotate cancer genomes. However, tumor heterogeneity is recognized as a serious barrier to annotate cancer genomes at the individual patient level. In addition, the interpretation and analysis of cancer multi-omics data rely heavily on existing database resources that are often located in different data centers or research institutions, which poses a huge challenge for data parsing. Here we present CCAS (Cancer genome Consensus Annotation System, https://ngdc.cncb.ac.cn/ccas/#/home), a one-stop and comprehensive annotation system for the individual patient at multi-omics level. CCAS integrates 20 widely recognized resources in the field to support data annotation of 10 categories of cancers covering 395 subtypes. Data from each resource are manually curated and standardized by using ontology frameworks. CCAS accepts data on single nucleotide variant/insertion or deletion, expression, copy number variation, and methylation level as input files to build a consensus annotation. Outputs are arranged in the forms of tables or figures and can be searched, sorted, and downloaded. Expanded panels with additional information are used for conciseness, and most figures are interactive to show additional information. Moreover, CCAS offers multidimensional annotation information, including mutation signature pattern, gene set enrichment analysis, pathways and clinical trial related information. These are helpful for intuitively understanding the molecular mechanisms of tumors and discovering key functional genes.

20.
Nucleic Acids Res ; 50(D1): D1016-D1024, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34591957

ABSTRACT

Transcriptomic profiling is critical to uncovering functional elements from transcriptional and post-transcriptional aspects. Here, we present Gene Expression Nebulas (GEN, https://ngdc.cncb.ac.cn/gen/), an open-access data portal integrating transcriptomic profiles under various biological contexts. GEN features a curated collection of high-quality bulk and single-cell RNA sequencing datasets by using standardized data processing pipelines and a structured curation model. Currently, GEN houses a large number of gene expression profiles from 323 datasets (157 bulk and 166 single-cell), covering 50 500 samples and 15 540 169 cells across 30 species, which are further categorized into six biological contexts. Moreover, GEN integrates a full range of transcriptomic profiles on expression, RNA editing and alternative splicing for 10 bulk datasets, providing opportunities for users to conduct integrative analysis at both transcriptional and post-transcriptional levels. In addition, GEN provides abundant gene annotations based on value-added curation of transcriptomic profiles and delivers online services for data analysis and visualization. Collectively, GEN presents a comprehensive collection of transcriptomic profiles across multiple species, thus serving as a fundamental resource for better understanding genetic regulatory architecture and functional mechanisms from tissues to cells.


Subject(s)
Databases, Genetic , Gene Expression Regulation/genetics , Molecular Sequence Annotation , Transcriptome/genetics , Animals , Gene Expression Profiling , Humans , Single-Cell Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...