Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 117
Filtrar
1.
Sci Data ; 11(1): 805, 2024 Jul 20.
Artículo en Inglés | MEDLINE | ID: mdl-39033182

RESUMEN

Circulating cell-free DNA (cfDNA) in the peripheral blood is a promising biomarker for cancer diagnosis and prognosis. Somatic mutations identified in cancers have been used to detect therapeutic targets for clinical transformation and individualize drug selection, while germline variants can predict a patient's risk of developing cancer and drug sensitivity. However, no platform has been developed to analyze, calculate, integrate, and friendly visualize these pan-cancer cfDNA mutations deeply. In this work, we performed panel sequencing encompassing 1,115 cancer-related genes across 16,659 cancer patients, spanning 27 cancer types. We detected 496 germline variants in leukocytes and 11,232 somatic mutations in the cfDNA of all patients. CPGV (Cancer Peripheral blood Gene Variations), a database constructed from this dataset, is the first pan-cancer cfDNA database that encompasses somatic mutations, germline variants, and further comparative analyses of mutations across different cancer types. It bears great promise to serve as a valuable resource for cancer research.


Asunto(s)
Neoplasias , Humanos , Neoplasias/genética , Neoplasias/sangre , Mutación , Mutación de Línea Germinal , Ácidos Nucleicos Libres de Células/sangre , Ácidos Nucleicos Libres de Células/genética , Biomarcadores de Tumor/sangre , Biomarcadores de Tumor/genética , Variación Genética , Bases de Datos Genéticas
2.
Artículo en Inglés | MEDLINE | ID: mdl-38913867

RESUMEN

The rapid advancement of sequencing technologies poses challenges in managing the large volume and exponential growth of sequence data efficiently and on time. To address this issue, we present GenBase (https://ngdc.cncb.ac.cn/genbase), an open-access data repository that follows the International Nucleotide Sequence Database Collaboration (INSDC) data standards and structures, for efficient nucleotide sequence archiving, searching, and sharing. As a core resource within the National Genomics Data Center (NGDC), of the China National Center for Bioinformation (CNCB; https://ngdc.cncb.ac.cn), GenBase offers bilingual submission pipeline and services, as well as local submission assistance in China. GenBase also provides a unique Excel format for metadata description and feature annotation of nucleotide sequences, along with a real-time data validation system to streamline sequence submissions. As of April 23, 2024, GenBase received 68,251 nucleotide sequences and 689,574 annotated protein sequences across 414 species from 2319 submissions. Out of these, 63,614 (93%) nucleotide sequences and 620,640 (90%) annotated protein sequences have been released and are publicly accessible through GenBase's web search system, File Transfer Protocol (FTP), and Application Programming Interface (API). Additionally, in collaboration with INSDC, GenBase has constructed an effective data exchange mechanism with GenBank and started sharing released nucleotide sequences. Furthermore, GenBase integrates all sequences from GenBank with daily updates, demonstrating its commitment to actively contributing to global sequence data management and sharing.

3.
J Mol Biol ; : 168655, 2024 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-38878855

RESUMEN

Nucleosome dynamics plays important roles in many biological processes, such as DNA replication and gene expression. NucMap (https://ngdc.cncb.ac.cn/nucmap) is the first database of genome-wide nucleosome positioning maps across species. Here, we present an updated version, NucMap 2.0, by incorporating more species and MNase-seq samples. In addition, we integrate other related omics data for each MNase-seq sample to provide a comprehensive view of nucleosome positioning, such as gene expression, transcription factor binding sites, histone modifications and DNA methylation. In particular, NucMap 2.0 integrates and pre-analyzes RNA-seq data and ChIP-seq data of human-related samples, which facilitates the interpretation of nucleosome positioning in humans. All processed data are integrated into an in-built genome browser, and users can make comprehensive side-by-side analyses. In addition, more online analytical functions are developed, which allows researchers to identify differential nucleosome regions and explore potential gene regulatory regions. All resources are open access with a user-friendly web interface.

4.
Front Plant Sci ; 15: 1371222, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38567138

RESUMEN

Pan-genome studies are important for understanding plant evolution and guiding the breeding of crops by containing all genomic diversity of a certain species. Three short-read-based strategies for plant pan-genome construction include iterative individual, iteration pooling, and map-to-pan. Their performance is very different under various conditions, while comprehensive evaluations have yet to be conducted nowadays. Here, we evaluate the performance of these three pan-genome construction strategies for plants under different sequencing depths and sample sizes. Also, we indicate the influence of length and repeat content percentage of novel sequences on three pan-genome construction strategies. Besides, we compare the computational resource consumption among the three strategies. Our findings indicate that map-to-pan has the greatest recall but the lowest precision. In contrast, both two iterative strategies have superior precision but lower recall. Factors of sample numbers, novel sequence length, and the percentage of novel sequences' repeat content adversely affect the performance of all three strategies. Increased sequencing depth improves map-to-pan's performance, while not affecting the other two iterative strategies. For computational resource consumption, map-to-pan demands considerably more than the other two iterative strategies. Overall, the iterative strategy, especially the iterative pooling strategy, is optimal when the sequencing depth is less than 20X. Map-to-pan is preferable when the sequencing depth exceeds 20X despite its higher computational resource consumption.

5.
aBIOTECH ; 5(1): 94-106, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38576435

RESUMEN

Genomic data serve as an invaluable resource for unraveling the intricacies of the higher plant systems, including the constituent elements within and among species. Through various efforts in genomic data archiving, integrative analysis and value-added curation, the National Genomics Data Center (NGDC), which is a part of the China National Center for Bioinformation (CNCB), has successfully established and currently maintains a vast amount of database resources. This dedicated initiative of the NGDC facilitates a data-rich ecosystem that greatly strengthens and supports genomic research efforts. Here, we present a comprehensive overview of central repositories dedicated to archiving, presenting, and sharing plant omics data, introduce knowledgebases focused on variants or gene-based functional insights, highlight species-specific multiple omics database resources, and briefly review the online application tools. We intend that this review can be used as a guide map for plant researchers wishing to select effective data resources from the NGDC for their specific areas of study. Supplementary Information: The online version contains supplementary material available at 10.1007/s42994-023-00134-4.

6.
Curr Microbiol ; 81(5): 122, 2024 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-38530471

RESUMEN

The chromosome structure of different bacteria has its unique organization pattern, which plays an important role in maintaining the spatial location relationship between genes and regulating gene expression. Conversely, transcription also plays a global role in regulating the three-dimensional structure of bacterial chromosomes. Therefore, we combine RNA-Seq and Hi-C technology to explore the relationship between chromosome structure changes and transcriptional regulation in E. coli at different growth stages. Transcriptome analysis indicates that E. coli synthesizes many ribosomes and peptidoglycan in the exponential phase. In contrast, E. coli undergoes more transcriptional regulation and catabolism during the stationary phase, reflecting its adaptability to changes in environmental conditions during growth. Analyzing the Hi-C data shows that E. coli has a higher frequency of global chromosomal interaction in the exponential phase and more defined chromosomal interaction domains (CIDs). Still, the long-distance interactions at the replication termination region are lower than in the stationary phase. Combining transcriptome and Hi-C data analysis, we conclude that highly expressed genes are more likely to be distributed in CID boundary regions during the exponential phase. At the same time, most high-expression genes distributed in the CID boundary regions are ribosomal gene clusters, forming clearer CID boundaries during the exponential phase. The three-dimensional structure of chromosome and expression pattern is altered during the growth of E. coli from the exponential phase to the stationary phase, clarifying the synergy between the two regulatory aspects.


Asunto(s)
Proteínas de Escherichia coli , Escherichia coli , Escherichia coli/genética , Proteínas de Escherichia coli/genética , Transcriptoma , Cromosomas Bacterianos/metabolismo , Estructuras Cromosómicas/metabolismo , Regulación Bacteriana de la Expresión Génica
7.
Nucleic Acids Res ; 52(D1): D1315-D1326, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37870452

RESUMEN

Human endogenous retroviruses (HERVs), as remnants of ancient exogenous retrovirus infected and integrated into germ cells, comprise ∼8% of the human genome. These HERVs have been implicated in numerous diseases, and extensive research has been conducted to uncover their specific roles. Despite these efforts, a comprehensive source of HERV-disease association still needs to be added. To address this gap, we introduce the HervD Atlas (https://ngdc.cncb.ac.cn/hervd/), an integrated knowledgebase of HERV-disease associations manually curated from all related published literature. In the current version, HervD Atlas collects 60 726 HERV-disease associations from 254 publications (out of 4692 screened literature), covering 21 790 HERVs (21 049 HERV-Terms and 741 HERV-Elements) belonging to six types, 149 diseases and 610 related/affected genes. Notably, an interactive knowledge graph that systematically integrates all the HERV-disease associations and corresponding affected genes into a comprehensive network provides a powerful tool to uncover and deduce the complex interplay between HERVs and diseases. The HervD Atlas also features a user-friendly web interface that allows efficient browsing, searching, and downloading of all association information, research metadata, and annotation information. Overall, the HervD Atlas is an essential resource for comprehensive, up-to-date knowledge on HERV-disease research, potentially facilitating the development of novel HERV-associated diagnostic and therapeutic strategies.


Asunto(s)
Retrovirus Endógenos , Bases del Conocimiento , Virosis , Humanos , Virosis/genética , Virosis/virología , Atlas como Asunto , Uso de Internet
8.
Nucleic Acids Res ; 52(D1): D1651-D1660, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37843152

RESUMEN

Tropical crops are vital for tropical agriculture, with resource scarcity, functional diversity and extensive market demand, providing considerable economic benefits for the world's tropical agriculture-producing countries. The rapid development of sequencing technology has promoted a milestone in tropical crop research, resulting in the generation of massive amount of data, which urgently needs an effective platform for data integration and sharing. However, the existing databases cannot fully satisfy researchers' requirements due to the relatively limited integration level and untimely update. Here, we present the Tropical Crop Omics Database (TCOD, https://ngdc.cncb.ac.cn/tcod), a comprehensive multi-omics data platform for tropical crops. TCOD integrates diverse omics data from 15 species, encompassing 34 chromosome-level de novo assemblies, 1 255 004 genes with functional annotations, 282 436 992 unique variants from 2048 WGS samples, 88 transcriptomic profiles from 1997 RNA-Seq samples and 13 381 germplasm items. Additionally, TCOD not only employs genes as a bridge to interconnect multi-omics data, enabling cross-species comparisons based on homology relationships, but also offers user-friendly online tools for efficient data mining and visualization. In short, TCOD integrates multi-species, multi-omics data and online tools, which will facilitate the research on genomic selective breeding and trait biology of tropical crops.


Asunto(s)
Productos Agrícolas , Bases de Datos Genéticas , Productos Agrícolas/genética , Transcriptoma , Genoma de Planta
9.
Sci Bull (Beijing) ; 68(22): 2806-2816, 2023 11 30.
Artículo en Inglés | MEDLINE | ID: mdl-37919157

RESUMEN

It is difficult to infer causality from high-dimension metagenomic data due to interference from numerous confounders. By imitating the twin studies in genetic research, we develop a straightforward method-virtual twins (VTwins)-to eliminate the confounder effects by transforming the original cohort into a paired cohort of "Twin" samples with distinct phenotypes but matched taxonomic profiles. The results show that VTwins outperforms the conventional approach in the sensitivity of identifying causative features and only requires a 10-fold reduced sample size for recalling disease-associated microbes or pathways, as tested by simulated and empirical data. Benchmark test with other 16 kinds of software further validates the power and applicability of VTwins for handling high-dimension compositional datasets and mining causalities in metagenomic research. In conclusion, VTwins is straightforward and effective in handling high-diversity, high-dimension compositional data, promising applications in mining causalities for metagenomic and potentially other omics data. VTwins is open access and available at https://github.com/mengqingren/VTwins.


Asunto(s)
Algoritmos , Metagenoma , Humanos , Metagenoma/genética , Programas Informáticos , Metagenómica/métodos
10.
Comput Struct Biotechnol J ; 21: 4675-4682, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37841327

RESUMEN

Cancer cell lines are essential in cancer research, yet accurate authentication of these cell lines can be challenging, particularly for consanguineous cell lines with close genetic similarities. We introduce a new Cancer Cell Line Hunter (CCLHunter) method to tackle this challenge. This approach utilizes the information of single nucleotide polymorphisms, expression profiles, and kindred topology to authenticate 1389 human cancer cell lines accurately. CCLHunter can precisely and efficiently authenticate cell lines from consanguineous lineages and those derived from other tissues of the same individual. Our evaluation results indicate that CCLHunter has a complete accuracy rate of 93.27%, with an accuracy of 89.28% even for consanguineous cell lines, outperforming existing methods. Additionally, we provide convenient access to CCLHunter through standalone software and a web server at https://ngdc.cncb.ac.cn/cclhunter.

11.
Commun Biol ; 6(1): 899, 2023 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-37658226

RESUMEN

Genome-wide association study has identified fruitful variants impacting heritable traits. Nevertheless, identifying critical genes underlying those significant variants has been a great task. Transcriptome-wide association study (TWAS) is an instrumental post-analysis to detect significant gene-trait associations focusing on modeling transcription-level regulations, which has made numerous progresses in recent years. Leveraging from expression quantitative loci (eQTL) regulation information, TWAS has advantages in detecting functioning genes regulated by disease-associated variants, thus providing insight into mechanisms of diseases and other phenotypes. Considering its vast potential, this review article comprehensively summarizes TWAS, including the methodology, applications and available resources.


Asunto(s)
Estudio de Asociación del Genoma Completo , Transcriptoma , Bases de Datos Factuales , Frutas , Fenotipo
12.
Artículo en Inglés | MEDLINE | ID: mdl-37742994
13.
Nat Aging ; 3(6): 705-721, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37118553

RESUMEN

How N6-methyladenosine (m6A), the most abundant mRNA modification, contributes to primate tissue homeostasis and physiological aging remains elusive. Here, we characterize the m6A epitranscriptome across the liver, heart and skeletal muscle in young and old nonhuman primates. Our data reveal a positive correlation between m6A modifications and gene expression homeostasis across tissues as well as tissue-type-specific aging-associated m6A dynamics. Among these tissues, skeletal muscle is the most susceptible to m6A loss in aging and shows a reduction in the m6A methyltransferase METTL3. We further show that METTL3 deficiency in human pluripotent stem cell-derived myotubes leads to senescence and apoptosis, and identify NPNT as a key element downstream of METTL3 involved in myotube homeostasis, whose expression and m6A levels are both decreased in senescent myotubes. Our study provides a resource for elucidating m6A-mediated mechanisms of tissue aging and reveals a METTL3-m6A-NPNT axis counteracting aging-associated skeletal muscle degeneration.


Asunto(s)
Hígado , Primates , Animales , Humanos , Primates/genética , Envejecimiento/genética , Homeostasis/genética , Metiltransferasas/genética
14.
Nucleic Acids Res ; 51(D1): D853-D860, 2023 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-36161321

RESUMEN

Single-cell studies have delineated cellular diversity and uncovered increasing numbers of previously uncharacterized cell types in complex tissues. Thus, synthesizing growing knowledge of cellular characteristics is critical for dissecting cellular heterogeneity, developmental processes and tumorigenesis at single-cell resolution. Here, we present Cell Taxonomy (https://ngdc.cncb.ac.cn/celltaxonomy), a comprehensive and curated repository of cell types and associated cell markers encompassing a wide range of species, tissues and conditions. Combined with literature curation and data integration, the current version of Cell Taxonomy establishes a well-structured taxonomy for 3,143 cell types and houses a comprehensive collection of 26,613 associated cell markers in 257 conditions and 387 tissues across 34 species. Based on 4,299 publications and single-cell transcriptomic profiles of ∼3.5 million cells, Cell Taxonomy features multifaceted characterization for cell types and cell markers, involving quality assessment of cell markers and cell clusters, cross-species comparison, cell composition of tissues and cellular similarity based on markers. Taken together, Cell Taxonomy represents a fundamentally useful reference to systematically and accurately characterize cell types and thus lays an important foundation for deeply understanding and exploring cellular biology in diverse species.

15.
Nucleic Acids Res ; 51(D1): D186-D191, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36330950

RESUMEN

LncBook, a comprehensive resource of human long non-coding RNAs (lncRNAs), has been used in a wide range of lncRNA studies across various biological contexts. Here, we present LncBook 2.0 (https://ngdc.cncb.ac.cn/lncbook), with significant updates and enhancements as follows: (i) incorporation of 119 722 new transcripts, 9632 new genes, and gene structure update of 21 305 lncRNAs; (ii) characterization of conservation features of human lncRNA genes across 40 vertebrates; (iii) integration of lncRNA-encoded small proteins; (iv) enrichment of expression and DNA methylation profiles with more biological contexts and (v) identification of lncRNA-protein interactions and improved prediction of lncRNA-miRNA interactions. Collectively, LncBook 2.0 accommodates a high-quality collection of 95 243 lncRNA genes and 323 950 transcripts and incorporates their abundant annotations at different omics levels, thereby enabling users to decipher functional significance of lncRNAs in different biological contexts.


Asunto(s)
Anotación de Secuencia Molecular , Multiómica , ARN Largo no Codificante , Animales , Humanos , MicroARNs/genética , ARN Largo no Codificante/metabolismo
16.
Nucleic Acids Res ; 51(D1): D994-D1002, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36318261

RESUMEN

Homology is fundamental to infer genes' evolutionary processes and relationships with shared ancestry. Existing homolog gene resources vary in terms of inferring methods, homologous relationship and identifiers, posing inevitable difficulties for choosing and mapping homology results from one to another. Here, we present HGD (Homologous Gene Database, https://ngdc.cncb.ac.cn/hgd), a comprehensive homologs resource integrating multi-species, multi-resources and multi-omics, as a complement to existing resources providing public and one-stop data service. Currently, HGD houses a total of 112 383 644 homologous pairs for 37 species, including 19 animals, 16 plants and 2 microorganisms. Meanwhile, HGD integrates various annotations from public resources, including 16 909 homologs with traits, 276 670 homologs with variants, 398 573 homologs with expression and 536 852 homologs with gene ontology (GO) annotations. HGD provides a wide range of omics gene function annotations to help users gain a deeper understanding of gene function.


Asunto(s)
Bases de Datos Genéticas , Animales , Anotación de Secuencia Molecular
17.
Nucleic Acids Res ; 51(D1): D767-D776, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36169225

RESUMEN

Compared with conventional comparative genomics, the recent studies in pan-genomics have provided further insights into species genomic dynamics, taxonomy and identification, pathogenicity and environmental adaptation. To better understand genome characteristics of species of interest and to fully excavate key metabolic and resistant genes and their conservations and variations, here we present ProPan (https://ngdc.cncb.ac.cn/propan), a public database covering 23 archaeal species and 1,481 bacterial species (in a total of 51,882 strains) for comprehensively profiling prokaryotic pan-genome dynamics. By analyzing and integrating these massive datasets, ProPan offers three major aspects for the pan-genome dynamics of the species of interest: 1) the evaluations of various species' characteristics and composition in pan-genome dynamics; 2) the visualization of map association, the functional annotation and presence/absence variation for all contained species' gene clusters; 3) the typical characteristics of the environmental adaptation, including resistance genes prediction of 126 substances (biocide, antimicrobial drug and metal) and evaluation of 31 metabolic cycle processes. Besides, ProPan develops a very user-friendly interface, flexible retrieval and multi-level real-time statistical visualization. Taken together, ProPan will serve as a weighty resource for the studies of prokaryotic pan-genome dynamics, taxonomy and identification as well as environmental adaptation.


Asunto(s)
Bases de Datos Genéticas , Genoma , Células Procariotas , Archaea/genética , Bacterias/genética , Genoma Bacteriano , Genómica
18.
Nucleic Acids Res ; 51(D1): D1179-D1187, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36243959

RESUMEN

Transcriptome-wide association studies (TWASs), as a practical and prevalent approach for detecting the associations between genetically regulated genes and traits, are now leading to a better understanding of the complex mechanisms of genetic variants in regulating various diseases and traits. Despite the ever-increasing TWAS outputs, there is still a lack of databases curating massive public TWAS information and knowledge. To fill this gap, here we present TWAS Atlas (https://ngdc.cncb.ac.cn/twas/), an integrated knowledgebase of TWAS findings manually curated from extensive literature. In the current implementation, TWAS Atlas collects 401,266 high-quality human gene-trait associations from 200 publications, covering 22,247 genes and 257 traits across 135 tissue types. In particular, an interactive knowledge graph of the collected gene-trait associations is constructed together with single nucleotide polymorphism (SNP)-gene associations to build up comprehensive regulatory networks at multi-omics levels. In addition, TWAS Atlas, as a user-friendly web interface, efficiently enables users to browse, search and download all association information, relevant research metadata and annotation information of interest. Taken together, TWAS Atlas is of great value for promoting the utility and availability of TWAS results in explaining the complex genetic basis as well as providing new insights for human health and disease research.


Asunto(s)
Sitios de Carácter Cuantitativo , Transcriptoma , Humanos , Transcriptoma/genética , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Bases del Conocimiento , Polimorfismo de Nucleótido Simple , Predisposición Genética a la Enfermedad
19.
Artículo en Inglés | MEDLINE | ID: mdl-36572336

RESUMEN

Biological databases serve as a global fundamental infrastructure for the worldwide scientific community, which dramatically aid the transformation of big data into knowledge discovery and drive significant innovations in a wide range of research fields. Given the rapid data production, biological databases continue to increase in size and importance. To build a catalog of worldwide biological databases, therefore, we curate a total of 5825 biological databases from 8931 publications, which are geographically distributed in 72 countries/regions and developed by 1975 institutions (as of September 20, 2022). We further devise a z-index, a novel index to characterize the scientific impact of a database, and rank all these biological databases as well as their hosting institutions and countries in terms of citation and z-index. Consequently, we present a series of statistics and trends of worldwide biological databases, yielding a global perspective to better understand their status and impact for life and health sciences. An up-to-date catalog of worldwide biological databases as well as their curated meta-information and derived statistics is publicly available at Database Commons (https://ngdc.cncb.ac.cn/databasecommons/).

20.
Biology (Basel) ; 11(7)2022 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-36101391

RESUMEN

Erysipelothrix rhusiopathiae is a causative agent of erysipelas in animals and erysipeloid in humans. However, current information regarding E. rhusiopathiae pathogenesis remains limited. Previously, we identified two E. rhusiopathiae strains, SE38 and G4T10, which were virulent and avirulent in pigs, respectively. Here, to further study the pathogenic mechanism of E. rhusiopathiae, we sequenced and assembled the genomes of strains SE38 and G4T10, and performed a comparative genomic analysis to identify differences or mutations in virulence-associated genes. Next, we comparatively analyzed 25 E. rhusiopathiae virulence-associated genes in SE38 and G4T10. Compared with that of SE38, the spaA gene of the G4T10 strain lacked 120 bp, encoding repeat units at the C-terminal of SpaA. To examine whether these deletions or splits influence E. rhusiopathiae virulence, these 120 bp were successfully deleted from the spaA gene in strain SE38 by homologous recombination. The mutant strain ΔspaA displayed attenuated virulence in mice and decreased adhesion to porcine iliac artery endothelial cells, which was also observed using the corresponding mutant protein SpaA'. Our results demonstrate that SpaA-mediated adhesion between E. rhusiopathiae and host cells is dependent on its C-terminal repeat units.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...