Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 116
Filtrar
1.
J Mol Biol ; : 168655, 2024 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-38878855

RESUMO

Nucleosome dynamics plays important roles in many biological processes, such as DNA replication and gene expression. NucMap (https://ngdc.cncb.ac.cn/nucmap) is the first database of genome-wide nucleosome positioning maps across species. Here, we present an updated version, NucMap 2.0, by incorporating more species and MNase-seq samples. In addition, we integrate other related omics data for each MNase-seq sample to provide a comprehensive view of nucleosome positioning, such as gene expression, transcription factor binding sites, histone modifications and DNA methylation. In particular, NucMap 2.0 integrates and pre-analyzes RNA-seq data and ChIP-seq data of human-related samples, which facilitates the interpretation of nucleosome positioning in humans. All processed data are integrated into an in-built genome browser, and users can make comprehensive side-by-side analyses. In addition, more online analytical functions are developed, which allows researchers to identify differential nucleosome regions and explore potential gene regulatory regions. All resources are open access with a user-friendly web interface.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38913867

RESUMO

The rapid advancement of sequencing technologies poses challenges in managing the large volume and exponential growth of sequence data efficiently and on time. To address this issue, we present GenBase (https://ngdc.cncb.ac.cn/genbase), an open-access data repository that follows the International Nucleotide Sequence Database Collaboration (INSDC) data standards and structures, for efficient nucleotide sequence archiving, searching, and sharing. As a core resource within the National Genomics Data Center (NGDC), of the China National Center for Bioinformation (CNCB; https://ngdc.cncb.ac.cn), GenBase offers bilingual submission pipeline and services, as well as local submission assistance in China. GenBase also provides a unique Excel format for metadata description and feature annotation of nucleotide sequences, along with a real-time data validation system to streamline sequence submissions. As of April 23, 2024, GenBase received 68,251 nucleotide sequences and 689,574 annotated protein sequences across 414 species from 2319 submissions. Out of these, 63,614 (93%) nucleotide sequences and 620,640 (90%) annotated protein sequences have been released and are publicly accessible through GenBase's web search system, File Transfer Protocol (FTP), and Application Programming Interface (API). Additionally, in collaboration with INSDC, GenBase has constructed an effective data exchange mechanism with GenBank and started sharing released nucleotide sequences. Furthermore, GenBase integrates all sequences from GenBank with daily updates, demonstrating its commitment to actively contributing to global sequence data management and sharing.

3.
aBIOTECH ; 5(1): 94-106, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38576435

RESUMO

Genomic data serve as an invaluable resource for unraveling the intricacies of the higher plant systems, including the constituent elements within and among species. Through various efforts in genomic data archiving, integrative analysis and value-added curation, the National Genomics Data Center (NGDC), which is a part of the China National Center for Bioinformation (CNCB), has successfully established and currently maintains a vast amount of database resources. This dedicated initiative of the NGDC facilitates a data-rich ecosystem that greatly strengthens and supports genomic research efforts. Here, we present a comprehensive overview of central repositories dedicated to archiving, presenting, and sharing plant omics data, introduce knowledgebases focused on variants or gene-based functional insights, highlight species-specific multiple omics database resources, and briefly review the online application tools. We intend that this review can be used as a guide map for plant researchers wishing to select effective data resources from the NGDC for their specific areas of study. Supplementary Information: The online version contains supplementary material available at 10.1007/s42994-023-00134-4.

4.
Front Plant Sci ; 15: 1371222, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38567138

RESUMO

Pan-genome studies are important for understanding plant evolution and guiding the breeding of crops by containing all genomic diversity of a certain species. Three short-read-based strategies for plant pan-genome construction include iterative individual, iteration pooling, and map-to-pan. Their performance is very different under various conditions, while comprehensive evaluations have yet to be conducted nowadays. Here, we evaluate the performance of these three pan-genome construction strategies for plants under different sequencing depths and sample sizes. Also, we indicate the influence of length and repeat content percentage of novel sequences on three pan-genome construction strategies. Besides, we compare the computational resource consumption among the three strategies. Our findings indicate that map-to-pan has the greatest recall but the lowest precision. In contrast, both two iterative strategies have superior precision but lower recall. Factors of sample numbers, novel sequence length, and the percentage of novel sequences' repeat content adversely affect the performance of all three strategies. Increased sequencing depth improves map-to-pan's performance, while not affecting the other two iterative strategies. For computational resource consumption, map-to-pan demands considerably more than the other two iterative strategies. Overall, the iterative strategy, especially the iterative pooling strategy, is optimal when the sequencing depth is less than 20X. Map-to-pan is preferable when the sequencing depth exceeds 20X despite its higher computational resource consumption.

5.
Curr Microbiol ; 81(5): 122, 2024 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-38530471

RESUMO

The chromosome structure of different bacteria has its unique organization pattern, which plays an important role in maintaining the spatial location relationship between genes and regulating gene expression. Conversely, transcription also plays a global role in regulating the three-dimensional structure of bacterial chromosomes. Therefore, we combine RNA-Seq and Hi-C technology to explore the relationship between chromosome structure changes and transcriptional regulation in E. coli at different growth stages. Transcriptome analysis indicates that E. coli synthesizes many ribosomes and peptidoglycan in the exponential phase. In contrast, E. coli undergoes more transcriptional regulation and catabolism during the stationary phase, reflecting its adaptability to changes in environmental conditions during growth. Analyzing the Hi-C data shows that E. coli has a higher frequency of global chromosomal interaction in the exponential phase and more defined chromosomal interaction domains (CIDs). Still, the long-distance interactions at the replication termination region are lower than in the stationary phase. Combining transcriptome and Hi-C data analysis, we conclude that highly expressed genes are more likely to be distributed in CID boundary regions during the exponential phase. At the same time, most high-expression genes distributed in the CID boundary regions are ribosomal gene clusters, forming clearer CID boundaries during the exponential phase. The three-dimensional structure of chromosome and expression pattern is altered during the growth of E. coli from the exponential phase to the stationary phase, clarifying the synergy between the two regulatory aspects.


Assuntos
Proteínas de Escherichia coli , Escherichia coli , Escherichia coli/genética , Proteínas de Escherichia coli/genética , Transcriptoma , Cromossomos Bacterianos/metabolismo , Estruturas Cromossômicas/metabolismo , Regulação Bacteriana da Expressão Gênica
6.
Nucleic Acids Res ; 52(D1): D1315-D1326, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37870452

RESUMO

Human endogenous retroviruses (HERVs), as remnants of ancient exogenous retrovirus infected and integrated into germ cells, comprise ∼8% of the human genome. These HERVs have been implicated in numerous diseases, and extensive research has been conducted to uncover their specific roles. Despite these efforts, a comprehensive source of HERV-disease association still needs to be added. To address this gap, we introduce the HervD Atlas (https://ngdc.cncb.ac.cn/hervd/), an integrated knowledgebase of HERV-disease associations manually curated from all related published literature. In the current version, HervD Atlas collects 60 726 HERV-disease associations from 254 publications (out of 4692 screened literature), covering 21 790 HERVs (21 049 HERV-Terms and 741 HERV-Elements) belonging to six types, 149 diseases and 610 related/affected genes. Notably, an interactive knowledge graph that systematically integrates all the HERV-disease associations and corresponding affected genes into a comprehensive network provides a powerful tool to uncover and deduce the complex interplay between HERVs and diseases. The HervD Atlas also features a user-friendly web interface that allows efficient browsing, searching, and downloading of all association information, research metadata, and annotation information. Overall, the HervD Atlas is an essential resource for comprehensive, up-to-date knowledge on HERV-disease research, potentially facilitating the development of novel HERV-associated diagnostic and therapeutic strategies.


Assuntos
Retrovirus Endógenos , Bases de Conhecimento , Viroses , Humanos , Viroses/genética , Viroses/virologia , Atlas como Assunto , Uso da Internet
7.
Nucleic Acids Res ; 52(D1): D1651-D1660, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37843152

RESUMO

Tropical crops are vital for tropical agriculture, with resource scarcity, functional diversity and extensive market demand, providing considerable economic benefits for the world's tropical agriculture-producing countries. The rapid development of sequencing technology has promoted a milestone in tropical crop research, resulting in the generation of massive amount of data, which urgently needs an effective platform for data integration and sharing. However, the existing databases cannot fully satisfy researchers' requirements due to the relatively limited integration level and untimely update. Here, we present the Tropical Crop Omics Database (TCOD, https://ngdc.cncb.ac.cn/tcod), a comprehensive multi-omics data platform for tropical crops. TCOD integrates diverse omics data from 15 species, encompassing 34 chromosome-level de novo assemblies, 1 255 004 genes with functional annotations, 282 436 992 unique variants from 2048 WGS samples, 88 transcriptomic profiles from 1997 RNA-Seq samples and 13 381 germplasm items. Additionally, TCOD not only employs genes as a bridge to interconnect multi-omics data, enabling cross-species comparisons based on homology relationships, but also offers user-friendly online tools for efficient data mining and visualization. In short, TCOD integrates multi-species, multi-omics data and online tools, which will facilitate the research on genomic selective breeding and trait biology of tropical crops.


Assuntos
Produtos Agrícolas , Bases de Dados Genéticas , Produtos Agrícolas/genética , Transcriptoma , Genoma de Planta
8.
Sci Bull (Beijing) ; 68(22): 2806-2816, 2023 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-37919157

RESUMO

It is difficult to infer causality from high-dimension metagenomic data due to interference from numerous confounders. By imitating the twin studies in genetic research, we develop a straightforward method-virtual twins (VTwins)-to eliminate the confounder effects by transforming the original cohort into a paired cohort of "Twin" samples with distinct phenotypes but matched taxonomic profiles. The results show that VTwins outperforms the conventional approach in the sensitivity of identifying causative features and only requires a 10-fold reduced sample size for recalling disease-associated microbes or pathways, as tested by simulated and empirical data. Benchmark test with other 16 kinds of software further validates the power and applicability of VTwins for handling high-dimension compositional datasets and mining causalities in metagenomic research. In conclusion, VTwins is straightforward and effective in handling high-diversity, high-dimension compositional data, promising applications in mining causalities for metagenomic and potentially other omics data. VTwins is open access and available at https://github.com/mengqingren/VTwins.


Assuntos
Algoritmos , Metagenoma , Humanos , Metagenoma/genética , Software , Metagenômica/métodos
9.
Comput Struct Biotechnol J ; 21: 4675-4682, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37841327

RESUMO

Cancer cell lines are essential in cancer research, yet accurate authentication of these cell lines can be challenging, particularly for consanguineous cell lines with close genetic similarities. We introduce a new Cancer Cell Line Hunter (CCLHunter) method to tackle this challenge. This approach utilizes the information of single nucleotide polymorphisms, expression profiles, and kindred topology to authenticate 1389 human cancer cell lines accurately. CCLHunter can precisely and efficiently authenticate cell lines from consanguineous lineages and those derived from other tissues of the same individual. Our evaluation results indicate that CCLHunter has a complete accuracy rate of 93.27%, with an accuracy of 89.28% even for consanguineous cell lines, outperforming existing methods. Additionally, we provide convenient access to CCLHunter through standalone software and a web server at https://ngdc.cncb.ac.cn/cclhunter.

10.
Commun Biol ; 6(1): 899, 2023 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-37658226

RESUMO

Genome-wide association study has identified fruitful variants impacting heritable traits. Nevertheless, identifying critical genes underlying those significant variants has been a great task. Transcriptome-wide association study (TWAS) is an instrumental post-analysis to detect significant gene-trait associations focusing on modeling transcription-level regulations, which has made numerous progresses in recent years. Leveraging from expression quantitative loci (eQTL) regulation information, TWAS has advantages in detecting functioning genes regulated by disease-associated variants, thus providing insight into mechanisms of diseases and other phenotypes. Considering its vast potential, this review article comprehensively summarizes TWAS, including the methodology, applications and available resources.


Assuntos
Estudo de Associação Genômica Ampla , Transcriptoma , Bases de Dados Factuais , Frutas , Fenótipo
11.
Artigo em Inglês | MEDLINE | ID: mdl-37742994
12.
Nat Aging ; 3(6): 705-721, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37118553

RESUMO

How N6-methyladenosine (m6A), the most abundant mRNA modification, contributes to primate tissue homeostasis and physiological aging remains elusive. Here, we characterize the m6A epitranscriptome across the liver, heart and skeletal muscle in young and old nonhuman primates. Our data reveal a positive correlation between m6A modifications and gene expression homeostasis across tissues as well as tissue-type-specific aging-associated m6A dynamics. Among these tissues, skeletal muscle is the most susceptible to m6A loss in aging and shows a reduction in the m6A methyltransferase METTL3. We further show that METTL3 deficiency in human pluripotent stem cell-derived myotubes leads to senescence and apoptosis, and identify NPNT as a key element downstream of METTL3 involved in myotube homeostasis, whose expression and m6A levels are both decreased in senescent myotubes. Our study provides a resource for elucidating m6A-mediated mechanisms of tissue aging and reveals a METTL3-m6A-NPNT axis counteracting aging-associated skeletal muscle degeneration.


Assuntos
Fígado , Primatas , Animais , Humanos , Primatas/genética , Envelhecimento/genética , Homeostase/genética , Metiltransferases/genética
13.
Nucleic Acids Res ; 51(D1): D853-D860, 2023 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-36161321

RESUMO

Single-cell studies have delineated cellular diversity and uncovered increasing numbers of previously uncharacterized cell types in complex tissues. Thus, synthesizing growing knowledge of cellular characteristics is critical for dissecting cellular heterogeneity, developmental processes and tumorigenesis at single-cell resolution. Here, we present Cell Taxonomy (https://ngdc.cncb.ac.cn/celltaxonomy), a comprehensive and curated repository of cell types and associated cell markers encompassing a wide range of species, tissues and conditions. Combined with literature curation and data integration, the current version of Cell Taxonomy establishes a well-structured taxonomy for 3,143 cell types and houses a comprehensive collection of 26,613 associated cell markers in 257 conditions and 387 tissues across 34 species. Based on 4,299 publications and single-cell transcriptomic profiles of ∼3.5 million cells, Cell Taxonomy features multifaceted characterization for cell types and cell markers, involving quality assessment of cell markers and cell clusters, cross-species comparison, cell composition of tissues and cellular similarity based on markers. Taken together, Cell Taxonomy represents a fundamentally useful reference to systematically and accurately characterize cell types and thus lays an important foundation for deeply understanding and exploring cellular biology in diverse species.

14.
Nucleic Acids Res ; 51(D1): D186-D191, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36330950

RESUMO

LncBook, a comprehensive resource of human long non-coding RNAs (lncRNAs), has been used in a wide range of lncRNA studies across various biological contexts. Here, we present LncBook 2.0 (https://ngdc.cncb.ac.cn/lncbook), with significant updates and enhancements as follows: (i) incorporation of 119 722 new transcripts, 9632 new genes, and gene structure update of 21 305 lncRNAs; (ii) characterization of conservation features of human lncRNA genes across 40 vertebrates; (iii) integration of lncRNA-encoded small proteins; (iv) enrichment of expression and DNA methylation profiles with more biological contexts and (v) identification of lncRNA-protein interactions and improved prediction of lncRNA-miRNA interactions. Collectively, LncBook 2.0 accommodates a high-quality collection of 95 243 lncRNA genes and 323 950 transcripts and incorporates their abundant annotations at different omics levels, thereby enabling users to decipher functional significance of lncRNAs in different biological contexts.


Assuntos
Anotação de Sequência Molecular , Multiômica , RNA Longo não Codificante , Animais , Humanos , MicroRNAs/genética , RNA Longo não Codificante/metabolismo
15.
Nucleic Acids Res ; 51(D1): D994-D1002, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36318261

RESUMO

Homology is fundamental to infer genes' evolutionary processes and relationships with shared ancestry. Existing homolog gene resources vary in terms of inferring methods, homologous relationship and identifiers, posing inevitable difficulties for choosing and mapping homology results from one to another. Here, we present HGD (Homologous Gene Database, https://ngdc.cncb.ac.cn/hgd), a comprehensive homologs resource integrating multi-species, multi-resources and multi-omics, as a complement to existing resources providing public and one-stop data service. Currently, HGD houses a total of 112 383 644 homologous pairs for 37 species, including 19 animals, 16 plants and 2 microorganisms. Meanwhile, HGD integrates various annotations from public resources, including 16 909 homologs with traits, 276 670 homologs with variants, 398 573 homologs with expression and 536 852 homologs with gene ontology (GO) annotations. HGD provides a wide range of omics gene function annotations to help users gain a deeper understanding of gene function.


Assuntos
Bases de Dados Genéticas , Animais , Anotação de Sequência Molecular
16.
Nucleic Acids Res ; 51(D1): D767-D776, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36169225

RESUMO

Compared with conventional comparative genomics, the recent studies in pan-genomics have provided further insights into species genomic dynamics, taxonomy and identification, pathogenicity and environmental adaptation. To better understand genome characteristics of species of interest and to fully excavate key metabolic and resistant genes and their conservations and variations, here we present ProPan (https://ngdc.cncb.ac.cn/propan), a public database covering 23 archaeal species and 1,481 bacterial species (in a total of 51,882 strains) for comprehensively profiling prokaryotic pan-genome dynamics. By analyzing and integrating these massive datasets, ProPan offers three major aspects for the pan-genome dynamics of the species of interest: 1) the evaluations of various species' characteristics and composition in pan-genome dynamics; 2) the visualization of map association, the functional annotation and presence/absence variation for all contained species' gene clusters; 3) the typical characteristics of the environmental adaptation, including resistance genes prediction of 126 substances (biocide, antimicrobial drug and metal) and evaluation of 31 metabolic cycle processes. Besides, ProPan develops a very user-friendly interface, flexible retrieval and multi-level real-time statistical visualization. Taken together, ProPan will serve as a weighty resource for the studies of prokaryotic pan-genome dynamics, taxonomy and identification as well as environmental adaptation.


Assuntos
Bases de Dados Genéticas , Genoma , Células Procarióticas , Archaea/genética , Bactérias/genética , Genoma Bacteriano , Genômica
17.
Nucleic Acids Res ; 51(D1): D1179-D1187, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36243959

RESUMO

Transcriptome-wide association studies (TWASs), as a practical and prevalent approach for detecting the associations between genetically regulated genes and traits, are now leading to a better understanding of the complex mechanisms of genetic variants in regulating various diseases and traits. Despite the ever-increasing TWAS outputs, there is still a lack of databases curating massive public TWAS information and knowledge. To fill this gap, here we present TWAS Atlas (https://ngdc.cncb.ac.cn/twas/), an integrated knowledgebase of TWAS findings manually curated from extensive literature. In the current implementation, TWAS Atlas collects 401,266 high-quality human gene-trait associations from 200 publications, covering 22,247 genes and 257 traits across 135 tissue types. In particular, an interactive knowledge graph of the collected gene-trait associations is constructed together with single nucleotide polymorphism (SNP)-gene associations to build up comprehensive regulatory networks at multi-omics levels. In addition, TWAS Atlas, as a user-friendly web interface, efficiently enables users to browse, search and download all association information, relevant research metadata and annotation information of interest. Taken together, TWAS Atlas is of great value for promoting the utility and availability of TWAS results in explaining the complex genetic basis as well as providing new insights for human health and disease research.


Assuntos
Locos de Características Quantitativas , Transcriptoma , Humanos , Transcriptoma/genética , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Bases de Conhecimento , Polimorfismo de Nucleotídeo Único , Predisposição Genética para Doença
18.
Artigo em Inglês | MEDLINE | ID: mdl-36572336

RESUMO

Biological databases serve as a global fundamental infrastructure for the worldwide scientific community, which dramatically aid the transformation of big data into knowledge discovery and drive significant innovations in a wide range of research fields. Given the rapid data production, biological databases continue to increase in size and importance. To build a catalog of worldwide biological databases, therefore, we curate a total of 5825 biological databases from 8931 publications, which are geographically distributed in 72 countries/regions and developed by 1975 institutions (as of September 20, 2022). We further devise a z-index, a novel index to characterize the scientific impact of a database, and rank all these biological databases as well as their hosting institutions and countries in terms of citation and z-index. Consequently, we present a series of statistics and trends of worldwide biological databases, yielding a global perspective to better understand their status and impact for life and health sciences. An up-to-date catalog of worldwide biological databases as well as their curated meta-information and derived statistics is publicly available at Database Commons (https://ngdc.cncb.ac.cn/databasecommons/).

19.
Biology (Basel) ; 11(7)2022 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-36101391

RESUMO

Erysipelothrix rhusiopathiae is a causative agent of erysipelas in animals and erysipeloid in humans. However, current information regarding E. rhusiopathiae pathogenesis remains limited. Previously, we identified two E. rhusiopathiae strains, SE38 and G4T10, which were virulent and avirulent in pigs, respectively. Here, to further study the pathogenic mechanism of E. rhusiopathiae, we sequenced and assembled the genomes of strains SE38 and G4T10, and performed a comparative genomic analysis to identify differences or mutations in virulence-associated genes. Next, we comparatively analyzed 25 E. rhusiopathiae virulence-associated genes in SE38 and G4T10. Compared with that of SE38, the spaA gene of the G4T10 strain lacked 120 bp, encoding repeat units at the C-terminal of SpaA. To examine whether these deletions or splits influence E. rhusiopathiae virulence, these 120 bp were successfully deleted from the spaA gene in strain SE38 by homologous recombination. The mutant strain ΔspaA displayed attenuated virulence in mice and decreased adhesion to porcine iliac artery endothelial cells, which was also observed using the corresponding mutant protein SpaA'. Our results demonstrate that SpaA-mediated adhesion between E. rhusiopathiae and host cells is dependent on its C-terminal repeat units.

20.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36088550

RESUMO

Somatic variants act as critical players during cancer occurrence and development. Thus, an accurate and robust method to identify them is the foundation of cutting-edge cancer genome research. However, due to low accessibility and high individual-/sample-specificity of the somatic variants in tumor samples, the detection is, to date, still crammed with challenges, particularly when lacking paired normal samples as control. To solve this burning issue, we developed a tumor-only somatic and germline variant identification method (TSomVar) using the random forest algorithm established on sample-specific variant datasets derived from genotype imputation, reads-mapping level annotation and functional annotation. We trained TSomVar by using genomic variant datasets of three major cancer types: colorectal cancer, hepatocellular carcinoma and skin cutaneous melanoma. Compared with existing tumor-only somatic variant identification tools, TSomVar shows excellent performances in somatic variant detection with higher accuracy and better capability of recalling for test datasets from colorectal cancer and skin cutaneous melanoma. In addition, TSomVar is equipped with the competence of accurately identifying germline variants in tumor samples. Taken together, TSomVar will undoubtedly facilitate and revolutionize somatic variant explorations in cancer research.


Assuntos
Neoplasias Colorretais , Melanoma , Neoplasias , Neoplasias Cutâneas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Melanoma/genética , Neoplasias/genética , Neoplasias Cutâneas/genética , Melanoma Maligno Cutâneo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...