Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38810116

RESUMO

MOTIVATION: Gene regulatory networks (GRNs) encode gene regulation in living organisms, and have become a critical tool to understand complex biological processes. However, due to the dynamic and complex nature of gene regulation, inferring GRNs from scRNA-seq data is still a challenging task. Existing computational methods usually focus on the close connections between genes, and ignore the global structure and distal regulatory relationships. RESULTS: In this study, we develop a supervised deep learning framework, IGEGRNS, to infer GRNs from scRNA-seq data based on graph embedding. In the framework, contextual information of genes is captured by GraphSAGE, which aggregates gene features and neighborhood structures to generate low-dimensional embedding for genes. Then, the k most influential nodes in the whole graph are filtered through Top-k pooling. Finally, potential regulatory relationships between genes are predicted by stacking CNNs. Compared with nine competing supervised and unsupervised methods, our method achieves better performance on six time-series scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION: Our method IGEGRNS is implemented in Python using the Pytorch machine learning library, and it is freely available at https://github.com/DHUDBlab/IGEGRNS.


Assuntos
Redes Reguladoras de Genes , Análise de Célula Única , Análise de Célula Única/métodos , Biologia Computacional/métodos , Transcriptoma/genética , Perfilação da Expressão Gênica/métodos , Humanos , Aprendizado Profundo , Algoritmos
2.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37930025

RESUMO

Drug combination therapy has gradually become a promising treatment strategy for complex or co-existing diseases. As drug-drug interactions (DDIs) may cause unexpected adverse drug reactions, DDI prediction is an important task in pharmacology and clinical applications. Recently, researchers have proposed several deep learning methods to predict DDIs. However, these methods mainly exploit the chemical or biological features of drugs, which is insufficient and limits the performances of DDI prediction. Here, we propose a new deep multimodal feature fusion framework for DDI prediction, DMFDDI, which fuses drug molecular graph, DDI network and the biochemical similarity features of drugs to predict DDIs. To fully extract drug molecular structure, we introduce an attention-gated graph neural network for capturing the global features of the molecular graph and the local features of each atom. A sparse graph convolution network is introduced to learn the topological structure information of the DDI network. In the multimodal feature fusion module, an attention mechanism is used to efficiently fuse different features. To validate the performance of DMFDDI, we compare it with 10 state-of-the-art methods. The comparison results demonstrate that DMFDDI achieves better performance in DDI prediction. Our method DMFDDI is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DHUDEBLab/DMFDDI.git.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Interações Medicamentosas , Estrutura Molecular , Biblioteca Gênica
3.
Bioinformatics ; 39(10)2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37812255

RESUMO

MOTIVATION: Drug combination therapy has exhibited remarkable therapeutic efficacy and has gradually become a promising clinical treatment strategy of complex diseases such as cancers. As the related databases keep expanding, computational methods based on deep learning model have become powerful tools to predict synergistic drug combinations. However, predicting effective synergistic drug combinations is still a challenge due to the high complexity of drug combinations, the lack of biological interpretability, and the large discrepancy in the response of drug combinations in vivo and in vitro biological systems. RESULTS: Here, we propose DGSSynADR, a new deep learning method based on global structured features of drugs and targets for predicting synergistic anticancer drug combinations. DGSSynADR constructs a heterogeneous graph by integrating the drug-drug, drug-target, protein-protein interactions and multi-omics data, utilizes a low-rank global attention (LRGA) model to perform global weighted aggregation of graph nodes and learn the global structured features of drugs and targets, and then feeds the embedded features into a bilinear predictor to predict the synergy scores of drug combinations in different cancer cell lines. Specifically, LRGA network brings better model generalization ability, and effectively reduces the complexity of graph computation. The bilinear predictor facilitates the dimension transformation of the features and fuses the feature representation of the two drugs to improve the prediction performance. The loss function Smooth L1 effectively avoids gradient explosion, contributing to better model convergence. To validate the performance of DGSSynADR, we compare it with seven competitive methods. The comparison results demonstrate that DGSSynADR achieves better performance. Meanwhile, the prediction of DGSSynADR is validated by previous findings in case studies. Furthermore, detailed ablation studies indicate that the one-hot coding drug feature, LRGA model and bilinear predictor play a key role in improving the prediction performance. AVAILABILITY AND IMPLEMENTATION: DGSSynADR is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DHUDBlab/DGSSynADR.


Assuntos
Protocolos de Quimioterapia Combinada Antineoplásica , Neoplasias , Humanos , Biologia Computacional/métodos , Combinação de Medicamentos , Neoplasias/tratamento farmacológico , Aprendizado de Máquina
4.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37313714

RESUMO

Single-cell RNA sequencing (scRNA-seq) measures transcriptome-wide gene expression at single-cell resolution. Clustering analysis of scRNA-seq data enables researchers to characterize cell types and states, shedding new light on cell-to-cell heterogeneity in complex tissues. Recently, self-supervised contrastive learning has become a prominent technique for underlying feature representation learning. However, for the noisy, high-dimensional and sparse scRNA-seq data, existing methods still encounter difficulties in capturing the intrinsic patterns and structures of cells, and seldom utilize prior knowledge, resulting in clusters that mismatch with the real situation. To this end, we propose scDECL, a novel deep enhanced constraint clustering algorithm for scRNA-seq data analysis based on contrastive learning and pairwise constraints. Specifically, based on interpolated contrastive learning, a pre-training model is trained to learn the feature embedding, and then perform clustering according to the constructed enhanced pairwise constraint. In the pre-training stage, a mixup data augmentation strategy and interpolation loss is introduced to improve the diversity of the dataset and the robustness of the model. In the clustering stage, the prior information is converted into enhanced pairwise constraints to guide the clustering. To validate the performance of scDECL, we compare it with six state-of-the-art algorithms on six real scRNA-seq datasets. The experimental results demonstrate the proposed algorithm outperforms the six competing methods. In addition, the ablation studies on each module of the algorithm indicate that these modules are complementary to each other and effective in improving the performance of the proposed algorithm. Our method scDECL is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DBLABDHU/scDECL.


Assuntos
Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados
5.
Front Oncol ; 12: 899825, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35692809

RESUMO

Accurate inference of gene regulatory rules is critical to understanding cellular processes. Existing computational methods usually decompose the inference of gene regulatory networks (GRNs) into multiple subproblems, rather than detecting potential causal relationships simultaneously, which limits the application to data with a small number of genes. Here, we propose BiRGRN, a novel computational algorithm for inferring GRNs from time-series single-cell RNA-seq (scRNA-seq) data. BiRGRN utilizes a bidirectional recurrent neural network to infer GRNs. The recurrent neural network is a complex deep neural network that can capture complex, non-linear, and dynamic relationships among variables. It maps neurons to genes, and maps the connections between neural network layers to the regulatory relationship between genes, providing an intuitive solution to model GRNs with biological closeness and mathematical flexibility. Based on the deep network, we transform the inference of GRNs into a regression problem, using the gene expression data at previous time points to predict the gene expression data at the later time point. Furthermore, we adopt two strategies to improve the accuracy and stability of the algorithm. Specifically, we utilize a bidirectional structure to integrate the forward and reverse inference results and exploit an incomplete set of prior knowledge to filter out some candidate inferences of low confidence. BiRGRN is applied to four simulated datasets and three real scRNA-seq datasets to verify the proposed method. We perform comprehensive comparisons between our proposed method with other state-of-the-art techniques. These experimental results indicate that BiRGRN is capable of inferring GRN simultaneously from time-series scRNA-seq data. Our method BiRGRN is implemented in Python using the TensorFlow machine-learning library, and it is freely available at https://gitee.com/DHUDBLab/bi-rgrn.

6.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35696651

RESUMO

The development of single-cell RNA-seq (scRNA-seq) technology allows researchers to characterize the cell types, states and transitions during dynamic biological processes at single-cell resolution. One of the critical tasks is to infer pseudo-time trajectory. However, the existence of transition cells in the intermediate state of complex biological processes poses a challenge for the trajectory inference. Here, we propose a new single-cell trajectory inference method based on transition entropy, named scTite, to identify transitional states and reconstruct cell trajectory from scRNA-seq data. Taking into account the continuity of cellular processes, we introduce a new metric called transition entropy to measure the uncertainty of a cell belonging to different cell clusters, and then identify cell states and transition cells. Specifically, we adopt different strategies to infer the trajectory for the identified cell states and transition cells, and combine them to obtain a detailed cell trajectory. For the identified cell clusters, we utilize the Wasserstein distance based on the probability distribution to calculate distance between clusters, and construct the minimum spanning tree. Meanwhile, we adopt the signaling entropy and partial correlation coefficient to determine transition paths, which contain a group of transition cells with the largest similarity. Then the transitional paths and the MST are combined to infer a refined cell trajectory. We apply scTite to four real scRNA-seq datasets and an integrated dataset, and conduct extensive performance comparison with nine existing trajectory inference methods. The experimental results demonstrate that the proposed method can reconstruct the cell trajectory more accurately than the compared algorithms. The scTite software package is available at https://github.com/dblab2022/scTite.


Assuntos
Análise de Célula Única , Transcriptoma , Entropia , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software
7.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35172334

RESUMO

Single-cell RNA sequencing (scRNA-seq) permits researchers to study the complex mechanisms of cell heterogeneity and diversity. Unsupervised clustering is of central importance for the analysis of the scRNA-seq data, as it can be used to identify putative cell types. However, due to noise impacts, high dimensionality and pervasive dropout events, clustering analysis of scRNA-seq data remains a computational challenge. Here, we propose a new deep structural clustering method for scRNA-seq data, named scDSC, which integrate the structural information into deep clustering of single cells. The proposed scDSC consists of a Zero-Inflated Negative Binomial (ZINB) model-based autoencoder, a graph neural network (GNN) module and a mutual-supervised module. To learn the data representation from the sparse and zero-inflated scRNA-seq data, we add a ZINB model to the basic autoencoder. The GNN module is introduced to capture the structural information among cells. By joining the ZINB-based autoencoder with the GNN module, the model transfers the data representation learned by autoencoder to the corresponding GNN layer. Furthermore, we adopt a mutual supervised strategy to unify these two different deep neural architectures and to guide the clustering task. Extensive experimental results on six real scRNA-seq datasets demonstrate that scDSC outperforms state-of-the-art methods in terms of clustering accuracy and scalability. Our method scDSC is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DHUDBlab/scDSC.


Assuntos
Redes Neurais de Computação , Análise de Célula Única , Análise por Conglomerados , Perfilação da Expressão Gênica , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
8.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 2512-2522, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33630737

RESUMO

Cellular programs often exhibit strong heterogeneity and asynchrony in the timing of program execution. Single-cell RNA-seq technology has provided an unprecedented opportunity for characterizing these cellular processes by simultaneously quantifying many parameters at single-cell resolution. Robust trajectory inference is a critical step in the analysis of dynamic temporal gene expression, which can shed light on the mechanisms of normal development and diseases. Here, we present TiC2D, a novel algorithm for cell trajectory inference from single-cell RNA-seq data, which adopts a consensus clustering strategy to precisely cluster cells. To evaluate the power of TiC2D, we compare it with three state-of-the-art methods on four independent single-cell RNA-seq datasets. The results show that TiC2D can accurately infer developmental trajectories from single-cell transcriptome. Furthermore, the reconstructed trajectories enable us to identify key genes involved in cell fate determination and to obtain new insights about their roles at different developmental stages.


Assuntos
Algoritmos , Análise de Célula Única , Análise por Conglomerados , Consenso , Perfilação da Expressão Gênica/métodos , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
9.
Comput Biol Chem ; 93: 107512, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34044202

RESUMO

Gene regulatory network models the interactions between transcription factors and target genes. Reconstructing gene regulation network is critically important to understand gene function in a particular cellular context, providing key insights into complex biological systems. We develop a new computational method, named iMPRN, which integrates multiple prior networks to infer regulatory network. Based on the network component analysis model, iMPRN adopts linear regression, graph embedding, and elastic networks to optimize each prior network in line with specific biological context. For each rewired prior networks, iMPRN evaluate the confidence of the regulatory edges in each network based on B scores and finally integrated these optimized networks. We validate the effectiveness of iMPRN by comparing it with four widely-used gene regulatory network reconstruction algorithms on a simulation data set. The results show that iMPRN can infer the gene regulatory network more accurately. Further, on a real scRNA-seq dataset, iMPRN is respectively applied to reconstruct gene regulatory networks for malignant and nonmalignant head and neck tumor cells, demonstrating distinctive differences in their corresponding regulatory networks.


Assuntos
Redes Reguladoras de Genes , Análise de Célula Única , Humanos , Transcriptoma
10.
Front Cell Dev Biol ; 8: 588041, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33195248

RESUMO

A complex tissue contains a variety of cells with distinct molecular signatures. Single-cell RNA sequencing has characterized the transcriptomes of different cell types and enables researchers to discover the underlying mechanisms of cellular heterogeneity. A critical task in single-cell transcriptome studies is to uncover transcriptional differences among specific cell types. However, the intercellular transcriptional variation is usually confounded with high level of technical noise, which masks the important biological signals. Here, we propose a new computational method DiffGE for differential analysis, adopting network entropy to measure the expression dynamics of gene groups among different cell types and to identify the highly differential gene groups. To evaluate the effectiveness of our proposed method, DiffGE is applied to three independent single-cell RNA-seq datasets and to identify the highly dynamic gene groups that exhibit distinctive expression patterns in different cell types. We compare the results of our method with those of three widely applied algorithms. Further, the gene function analysis indicates that these detected differential gene groups are significantly related to cellular regulation processes. The results demonstrate the power of our method in evaluating the transcriptional dynamics and identifying highly differential gene groups among different cell types.

11.
BMC Bioinformatics ; 20(Suppl 15): 598, 2019 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-31874597

RESUMO

BACKGROUND: Super-enhancers (SEs) are clusters of transcriptional active enhancers, which dictate the expression of genes defining cell identity and play an important role in the development and progression of tumors and other diseases. Many key cancer oncogenes are driven by super-enhancers, and the mutations associated with common diseases such as Alzheimer's disease are significantly enriched with super-enhancers. Super-enhancers have shown great potential for the identification of key oncogenes and the discovery of disease-associated mutational sites. RESULTS: In this paper, we propose a new computational method called DEEPSEN for predicting super-enhancers based on convolutional neural network. The proposed method integrates 36 kinds of features. Compared with existing approaches, our method performs better and can be used for genome-wide prediction of super-enhancers. Besides, we screen important features for predicting super-enhancers. CONCLUSION: Convolutional neural network is effective in boosting the performance of super-enhancer prediction.


Assuntos
Redes Neurais de Computação , Humanos , Neoplasias/genética , Oncogenes
12.
BMC Genomics ; 20(Suppl 2): 221, 2019 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-30967107

RESUMO

BACKGROUND: Epigenome is highly dynamic during the early stages of embryonic development. Epigenetic modifications provide the necessary regulation for lineage specification and enable the maintenance of cellular identity. Given the rapid accumulation of genome-wide epigenomic modification maps across cellular differentiation process, there is an urgent need to characterize epigenetic dynamics and reveal their impacts on differential gene regulation. METHODS: We proposed DiffEM, a computational method for differential analysis of epigenetic modifications and identified highly dynamic modification sites along cellular differentiation process. We applied this approach to investigating 6 epigenetic marks of 20 kinds of human early developmental stages and tissues, including hESCs, 4 hESC-derived lineages and 15 human primary tissues. RESULTS: We identified highly dynamic modification sites where different cell types exhibit distinctive modification patterns, and found that these highly dynamic sites enriched in the genes related to cellular development and differentiation. Further, to evaluate the effectiveness of our method, we correlated the dynamics scores of epigenetic modifications with the variance of gene expression, and compared the results of our method with those of the existing algorithms. The comparison results demonstrate the power of our method in evaluating the epigenetic dynamics and identifying highly dynamic regions along cell differentiation process.


Assuntos
Linhagem da Célula , Células-Tronco Embrionárias/citologia , Células-Tronco Embrionárias/metabolismo , Epigenômica , Regulação da Expressão Gênica no Desenvolvimento , Genoma Humano , Diferenciação Celular , Histonas/genética , Histonas/metabolismo , Humanos , Especificidade de Órgãos
13.
Front Genet ; 10: 1298, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32010182

RESUMO

Epigenetic alteration is a fundamental characteristic of nearly all human cancers. Tumor cells not only harbor genetic alterations, but also are regulated by diverse epigenetic modifications. Identification of epigenetic similarities across different cancer types is beneficial for the discovery of treatments that can be extended to different cancers. Nowadays, abundant epigenetic modification profiles have provided a great opportunity to achieve this goal. Here, we proposed a new approach TriPCE, introducing tri-clustering strategy to integrative pan-cancer epigenomic analysis. The method is able to identify coherent patterns of various epigenetic modifications across different cancer types. To validate its capability, we applied the proposed TriPCE to analyze six important epigenetic marks among seven cancer types, and identified significant cross-cancer epigenetic similarities. These results suggest that specific epigenetic patterns indeed exist among these investigated cancers. Furthermore, the gene functional analysis performed on the associated gene sets demonstrates strong relevance with cancer development and reveals consistent risk tendency among these investigated cancer types.

14.
BMC Med Genomics ; 11(Suppl 6): 117, 2018 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-30598115

RESUMO

BACKGROUND: Human cancers are complex ecosystems composed of cells with distinct molecular signatures. Such intratumoral heterogeneity poses a major challenge to cancer diagnosis and treatment. Recent advancements of single-cell techniques such as scRNA-seq have brought unprecedented insights into cellular heterogeneity. Subsequently, a challenging computational problem is to cluster high dimensional noisy datasets with substantially fewer cells than the number of genes. METHODS: In this paper, we introduced a consensus clustering framework conCluster, for cancer subtype identification from single-cell RNA-seq data. Using an ensemble strategy, conCluster fuses multiple basic partitions to consensus clusters. RESULTS: Applied to real cancer scRNA-seq datasets, conCluster can more accurately detect cancer subtypes than the widely used scRNA-seq clustering methods. Further, we conducted co-expression network analysis for the identified melanoma subtypes. CONCLUSIONS: Our analysis demonstrates that these subtypes exhibit distinct gene co-expression networks and significant gene sets with different functional enrichment.


Assuntos
Neoplasias/classificação , RNA Neoplásico , Análise por Conglomerados , Conjuntos de Dados como Assunto , Humanos , Dados de Sequência Molecular , Neoplasias/genética
15.
BMC Bioinformatics ; 18(Suppl 12): 418, 2017 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-29072144

RESUMO

BACKGROUND: Studies have shown that enhancers are significant regulatory elements to play crucial roles in gene expression regulation. Since enhancers are unrelated to the orientation and distance to their target genes, it is a challenging mission for scholars and researchers to accurately predicting distal enhancers. In the past years, with the high-throughout ChiP-seq technologies development, several computational techniques emerge to predict enhancers using epigenetic or genomic features. Nevertheless, the inconsistency of computational models across different cell-lines and the unsatisfactory prediction performance call for further research in this area. RESULTS: Here, we propose a new Deep Belief Network (DBN) based computational method for enhancer prediction, which is called EnhancerDBN. This method combines diverse features, composed of DNA sequence compositional features, DNA methylation and histone modifications. Our computational results indicate that 1) EnhancerDBN outperforms 13 existing methods in prediction, and 2) GC content and DNA methylation can serve as relevant features for enhancer prediction. CONCLUSION: Deep learning is effective in boosting the performance of enhancer prediction.


Assuntos
Algoritmos , Biologia Computacional/métodos , Elementos Facilitadores Genéticos , Bases de Dados Genéticas , Humanos , Curva ROC
16.
BMC Bioinformatics ; 18(1): 103, 2017 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-28187703

RESUMO

BACKGROUND: Differences in chromatin states are critical to the multiplicity of cell states. Recently genome-wide histone modification maps of diverse human developmental stages and tissues have been charted. DESCRIPTION: To facilitate the investigation of epigenetic dynamics and regulatory mechanisms in cellular differentiation processes, we developed iHMS, an integrated human histone modification database that incorporates massive histone modification maps spanning different developmental stages, lineages and tissues ( http://www.tongjidmb.com/human/index.html ). It also includes genome-wide expression data of different conditions, reference gene annotations, GC content and CpG island information. By providing an intuitive and user-friendly query interface, iHMS enables comprehensive query and comparative analysis based on gene names, genomic region locations, histone modification marks and cell types. Moreover, it offers an efficient browser that allows users to visualize and compare multiple genome-wide histone modification maps and related expression profiles across different developmental stages and tissues. CONCLUSION: iHMS is of great helpfulness to understand how global histone modification state transitions impact cellular phenotypes across different developmental stages and tissues in the human genome. This extensive catalog of histone modification states thus presents an important resource for epigenetic and developmental studies.


Assuntos
Bases de Dados Genéticas , Histonas/metabolismo , Interface Usuário-Computador , Cromatina/metabolismo , Ilhas de CpG , Humanos , Internet , Processamento de Proteína Pós-Traducional
17.
BMC Bioinformatics ; 17(Suppl 17): 537, 2016 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-28155634

RESUMO

BACKGROUND: Differentiation of human embryonic stem cells requires precise control of gene expression that depends on specific spatial and temporal epigenetic regulation. Recently available temporal epigenomic data derived from cellular differentiation processes provides an unprecedented opportunity for characterizing fundamental properties of epigenomic dynamics and revealing regulatory roles of epigenetic modifications. RESULTS: This paper presents a spatial temporal clustering approach, named STCluster, which exploits the temporal variation information of epigenomes to characterize dynamic epigenetic mode during cellular differentiation. This approach identifies significant spatial temporal patterns of epigenetic modifications along human embryonic stem cell differentiation and cluster regulatory sequences by their spatial temporal epigenetic patterns. CONCLUSIONS: The results show that this approach is effective in capturing epigenetic modification patterns associated with specific cell types. In addition, STCluster allows straightforward identification of coherent epigenetic modes in multiple cell types, indicating the ability in the establishment of the most conserved epigenetic signatures during cellular differentiation process.


Assuntos
Diferenciação Celular/genética , Análise por Conglomerados , Células-Tronco Embrionárias/fisiologia , Epigênese Genética , Regulação da Expressão Gênica no Desenvolvimento , Metilação de DNA , Células-Tronco Embrionárias/metabolismo , Histonas/metabolismo , Humanos
18.
Artigo em Inglês | MEDLINE | ID: mdl-26355509

RESUMO

Accurate identification of cis-regulatory elements and their correlated modules is essential for analysis of transcriptional regulation, which is a challenging problem in computational biology. Unsupervised learning has the advantage of compensating for missing annotated data, and is thus promising to be effective to identify cis-regulatory elements and modules. We introduced a Conditional Random Fields model, referred to as CRFEM, to integrate sequence features and long-range dependency of genomic sequences such as epigenetic features to identify cis-regulatory elements and modules at the same time. The proposed method is able to automatically learn model parameters with no labeled data and explicitly optimize the predictive probability of cis-regulatory elements and modules. In comparison with existing methods, our method is more accurate and can be used for genome-wide studies of gene regulation.


Assuntos
Sítios de Ligação/genética , Genômica/métodos , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/genética , Animais , Drosophila/genética , Humanos , Cadeias de Markov , Saccharomyces cerevisiae/genética
19.
Artigo em Inglês | MEDLINE | ID: mdl-26356334

RESUMO

Nucleosomes are basic elements of chromatin structure. The positioning of nucleosomes along a genome is very important to dictate eukaryotic DNA compaction and access. Current computational methods have focused on the analysis of nucleosome occupancy and the positioning of well-positioned nucleosomes. However, fuzzy nucleosomes require more complex configurations and are more difficult to predict their positions. We analyzed the positioning of well-positioned and fuzzy nucleosomes from a novel structural perspective, and proposed WaveNuc, a computational approach for inferring their positions based on continuous wavelet transformation. The comparative analysis demonstrates that these two kinds of nucleosomes exhibit different propeller twist structural characteristics. Well-positioned nucleosomes tend to locate at sharp peaks of the propeller twist profile, whereas fuzzy nucleosomes correspond to broader peaks. The sharpness of these peaks shows that the propeller twist profile may contain nucleosome positioning information. Exploiting this knowledge, we applied WaveNuc to detect the two different kinds of peaks of the propeller twist profile along the genome. We compared the performance of our method with existing methods on real data sets. The results show that the proposed method can accurately resolve complex configurations of fuzzy nucleosomes, which leads to better performance of nucleosome positioning prediction on the whole genome.


Assuntos
Biologia Computacional/métodos , DNA/ultraestrutura , Nucleossomos/ultraestrutura , Análise de Ondaletas , Modelos Genéticos , Modelos Estatísticos
20.
BMC Bioinformatics ; 13: 49, 2012 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-22449207

RESUMO

BACKGROUND: Nucleosome distribution along chromatin dictates genomic DNA accessibility and thus profoundly influences gene expression. However, the underlying mechanism of nucleosome formation remains elusive. Here, taking a structural perspective, we systematically explored nucleosome formation potential of genomic sequences and the effect on chromatin organization and gene expression in S. cerevisiae. RESULTS: We analyzed twelve structural features related to flexibility, curvature and energy of DNA sequences. The results showed that some structural features such as DNA denaturation, DNA-bending stiffness, Stacking energy, Z-DNA, Propeller twist and free energy, were highly correlated with in vitro and in vivo nucleosome occupancy. Specifically, they can be classified into two classes, one positively and the other negatively correlated with nucleosome occupancy. These two kinds of structural features facilitated nucleosome binding in centromere regions and repressed nucleosome formation in the promoter regions of protein-coding genes to mediate transcriptional regulation. Based on these analyses, we integrated all twelve structural features in a model to predict more accurately nucleosome occupancy in vivo than the existing methods that mainly depend on sequence compositional features. Furthermore, we developed a novel approach, named DLaNe, that located nucleosomes by detecting peaks of structural profiles, and built a meta predictor to integrate information from different structural features. As a comparison, we also constructed a hidden Markov model (HMM) to locate nucleosomes based on the profiles of these structural features. The result showed that the meta DLaNe and HMM-based method performed better than the existing methods, demonstrating the power of these structural features in predicting nucleosome positions. CONCLUSIONS: Our analysis revealed that DNA structures significantly contribute to nucleosome organization and influence chromatin structure and gene expression regulation. The results indicated that our proposed methods are effective in predicting nucleosome occupancy and positions and that these structural features are highly predictive of nucleosome organization.The implementation of our DLaNe method based on structural features is available online.


Assuntos
Regulação Fúngica da Expressão Gênica , Nucleossomos/metabolismo , Saccharomyces cerevisiae/genética , Centrômero , Cromatina/metabolismo , DNA Forma Z/metabolismo , Genoma Fúngico , Estudo de Associação Genômica Ampla , Cadeias de Markov , Regiões Promotoras Genéticas , Saccharomyces cerevisiae/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...