Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters











Database
Language
Publication year range
1.
Nucleic Acids Res ; 2024 Aug 24.
Article in English | MEDLINE | ID: mdl-39180401

ABSTRACT

The amount of genomic region data continues to increase. Integrating across diverse genomic region sets requires consensus regions, which enable comparing regions across experiments, but also by necessity lose precision in region definitions. We require methods to assess this loss of precision and build optimal consensus region sets. Here, we introduce the concept of flexible intervals and propose three novel methods for building consensus region sets, or universes: a coverage cutoff method, a likelihood method, and a Hidden Markov Model. We then propose three novel measures for evaluating how well a proposed universe fits a collection of region sets: a base-level overlap score, a region boundary distance score, and a likelihood score. We apply our methods and evaluation approaches to several collections of region sets and show how these methods can be used to evaluate fit of universes and build optimal universes. We describe scenarios where the common approach of merging regions to create consensus leads to undesirable outcomes and provide principled alternatives that provide interoperability of interval data while minimizing loss of resolution.

2.
NAR Genom Bioinform ; 6(3): lqae086, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39131817

ABSTRACT

Representation learning models have become a mainstay of modern genomics. These models are trained to yield vector representations, or embeddings, of various biological entities, such as cells, genes, individuals, or genomic regions. Recent applications of unsupervised embedding approaches have been shown to learn relationships among genomic regions that define functional elements in a genome. Unsupervised representation learning of genomic regions is free of the supervision from curated metadata and can condense rich biological knowledge from publicly available data to region embeddings. However, there exists no method for evaluating the quality of these embeddings in the absence of metadata, making it difficult to assess the reliability of analyses based on the embeddings, and to tune model training to yield optimal results. To bridge this gap, we propose four evaluation metrics: the cluster tendency score (CTS), the reconstruction score (RCS), the genome distance scaling score (GDSS), and the neighborhood preserving score (NPS). The CTS and RCS statistically quantify how well region embeddings can be clustered and how well the embeddings preserve information in training data. The GDSS and NPS exploit the biological tendency of regions close in genomic space to have similar biological functions; they measure how much such information is captured by individual region embeddings in a set. We demonstrate the utility of these statistical and biological scores for evaluating unsupervised genomic region embeddings and provide guidelines for learning reliable embeddings.

3.
NAR Genom Bioinform ; 6(3): lqae073, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38974799

ABSTRACT

Data from the single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) are now widely available. One major computational challenge is dealing with high dimensionality and inherent sparsity, which is typically addressed by producing lower dimensional representations of single cells for downstream clustering tasks. Current approaches produce such individual cell embeddings directly through a one-step learning process. Here, we propose an alternative approach by building embedding models pre-trained on reference data. We argue that this provides a more flexible analysis workflow that also has computational performance advantages through transfer learning. We implemented our approach in scEmbed, an unsupervised machine-learning framework that learns low-dimensional embeddings of genomic regulatory regions to represent and analyze scATAC-seq data. scEmbed performs well in terms of clustering ability and has the key advantage of learning patterns of region co-occurrence that can be transferred to other, unseen datasets. Moreover, models pre-trained on reference data can be exploited to build fast and accurate cell-type annotation systems without the need for other data modalities. scEmbed is implemented in Python and it is available to download from GitHub. We also make our pre-trained models available on huggingface for public use. scEmbed is open source and available at https://github.com/databio/geniml. Pre-trained models from this work can be obtained on huggingface: https://huggingface.co/databio.

4.
Neuroendocrinology ; 114(1): 51-63, 2024.
Article in English | MEDLINE | ID: mdl-37699356

ABSTRACT

INTRODUCTION: Growth hormone secretion by sporadic somatotroph neuroendocrine pituitary tumors (PitNETs) is a major cause of acromegaly. These tumors are relatively heterogenous in terms of histopathological and molecular features. Our previous transcriptomic profiling of somatotroph tumors revealed three distinct molecular subtypes. This study aimed to investigate the difference in DNA methylation patterns in subtypes of somatotroph PitNETs and its role in distinctive gene expression. METHODS: Genome-wide DNA methylation was investigated in 48 somatotroph PitNETs with EPIC microarrays. Gene expression was assessed with RNAseq. Bisulfite pyrosequencing and qRT-PCR were used for verifying the results of DNA methylation and gene expression. RESULTS: Clustering tumor samples based on methylation data reflected the transcriptome-related classification. Subtype 1 tumors are densely granulated without GNAS mutation, characterized by high expression of NR5A1 (SF-1) and GIPR. The expression of both genes is correlated with specific methylation of the gene body and promoter. This subtype has a lower methylation level of 5' gene regions and CpG islands than the remaining tumors. Subtype 2 PitNETs are densely granulated and frequently GNAS-mutated, while those in subtype 3 are mainly sparsely granulated. Methylation/expression analysis indicates that ∼50% genes located in differentially methylated regions are those differentially expressed between tumor subtypes. Correlation analysis revealed DNA methylation-controlled genes, including CDKN1B, CCND2, EBF3, CDH4, CDH12, MGMT, STAT5A, PLXND1, PTPRE, and MMP16, and genes encoding ion channels and semaphorins. CONCLUSION: DNA methylation profiling confirmed the existence of three molecular subtypes of somatotroph PitNETs. High expression of NR5A1 and GIPR in subtype 1 tumors is correlated with specific methylation of both genes.


Subject(s)
Adenoma , Growth Hormone-Secreting Pituitary Adenoma , Neuroendocrine Tumors , Pituitary Neoplasms , Somatotrophs , Humans , DNA Methylation , Neuroendocrine Tumors/genetics , Neuroendocrine Tumors/pathology , Somatotrophs/metabolism , Growth Hormone-Secreting Pituitary Adenoma/genetics , Growth Hormone-Secreting Pituitary Adenoma/pathology , Pituitary Neoplasms/genetics , Pituitary Neoplasms/pathology , Adenoma/metabolism , Transcription Factors/genetics
5.
Cells ; 11(23)2022 Nov 30.
Article in English | MEDLINE | ID: mdl-36497102

ABSTRACT

Acromegaly results from growth hormone hypersecretion, predominantly caused by a somatotroph pituitary neuroendocrine tumor (PitNET). Acromegaly-causing tumors are histologically diverse. Our aim was to determine transcriptomic profiles of various somatotroph PitNETs and to evaluate clinical implication of differential gene expression. A total of 48 tumors were subjected to RNA sequencing, while expression of selected genes was assessed in 134 tumors with qRT-PCR. Whole-transcriptome analysis revealed three transcriptomic groups of somatotroph PitNETs. They differ in expression of numerous genes including those involved in growth hormone secretion and known prognostic genes. Transcriptomic subgroups can be distinguished by determining the expression of marker genes. Analysis of the entire cohort of patients confirmed differences between molecular subtypes of tumors. Transcriptomic group 1 includes ~20% of acromegaly patients with GNAS mutations-negative, mainly densely granulated tumors that co-express GIPR and NR5A1 (SF-1). SF-1 expression was verified with immunohistochemistry. Transcriptomic group 2 tumors are the most common (46%) and include mainly GNAS-mutated, densely granulated somatotroph and mixed PitNETs. They have a smaller size and express favorable prognosis-related genes. Transcriptomic group 3 includes predominantly sparsely granulated somatotroph PitNETs with low GNAS mutations frequency causing ~35% of acromegaly. Ghrelin signaling is implicated in their pathogenesis. They have an unfavorable gene expression profile and higher invasive growth rate.


Subject(s)
Acromegaly , Adenoma , Neuroendocrine Tumors , Pituitary Neoplasms , Humans , Neuroendocrine Tumors/complications , Neuroendocrine Tumors/genetics , Neuroendocrine Tumors/pathology , Transcriptome/genetics , Adenoma/genetics , Pituitary Neoplasms/genetics , Acromegaly/genetics , Acromegaly/pathology , Growth Hormone/metabolism , Gene Expression Profiling
SELECTION OF CITATIONS
SEARCH DETAIL