Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
SN Comput Sci ; 4(2): 114, 2023.
Article in English | MEDLINE | ID: mdl-36573207

ABSTRACT

This paper presents a consensus-based approach that incorporates three microarray and three RNA-Seq methods for unbiased and integrative identification of differentially expressed genes (DEGs) as potential biomarkers for critical disease(s). The proposed method performs satisfactorily on two microarray datasets (GSE20347 and GSE23400) and one RNA-Seq dataset (GSE130078) for esophageal squamous cell carcinoma (ESCC). Based on the input dataset, our framework employs specific DE methods to detect DEGs independently. A consensus based function that first considers DEGs common to all three methods for further downstream analysis has been introduced. The consensus function employs other parameters to overcome information loss. Differential co-expression (DCE) and preservation analysis of DEGs facilitates the study of behavioral changes in interactions among DEGs under normal and diseased circumstances. Considering hub genes in biologically relevant modules and most GO and pathway enriched DEGs as candidates for potential biomarkers of ESCC, we perform further validation through biological analysis as well as literature evidence. We have identified 25 DEGs that have strong biological relevance to their respective datasets and have previous literature establishing them as potential biomarkers for ESCC. We have further identified 8 additional DEGs as probable potential biomarkers for ESCC, but recommend further in-depth analysis.

2.
BMC Bioinformatics ; 23(1): 17, 2022 Jan 06.
Article in English | MEDLINE | ID: mdl-34991439

ABSTRACT

BACKGROUND: A limitation of traditional differential expression analysis on small datasets involves the possibility of false positives and false negatives due to sample variation. Considering the recent advances in deep learning (DL) based models, we wanted to expand the state-of-the-art in disease biomarker prediction from RNA-seq data using DL. However, application of DL to RNA-seq data is challenging due to absence of appropriate labels and smaller sample size as compared to number of genes. Deep learning coupled with transfer learning can improve prediction performance on novel data by incorporating patterns learned from other related data. With the emergence of new disease datasets, biomarker prediction would be facilitated by having a generalized model that can transfer the knowledge of trained feature maps to the new dataset. To the best of our knowledge, there is no Convolutional Neural Network (CNN)-based model coupled with transfer learning to predict the significant upregulating (UR) and downregulating (DR) genes from both trained and untrained datasets. RESULTS: We implemented a CNN model, DEGnext, to predict UR and DR genes from gene expression data obtained from The Cancer Genome Atlas database. DEGnext uses biologically validated data along with logarithmic fold change values to classify differentially expressed genes (DEGs) as UR and DR genes. We applied transfer learning to our model to leverage the knowledge of trained feature maps to untrained cancer datasets. DEGnext's results were competitive (ROC scores between 88 and 99[Formula: see text]) with those of five traditional machine learning methods: Decision Tree, K-Nearest Neighbors, Random Forest, Support Vector Machine, and XGBoost. DEGnext was robust and effective in terms of transferring learned feature maps to facilitate classification of unseen datasets. Additionally, we validated that the predicted DEGs from DEGnext were mapped to significant Gene Ontology terms and pathways related to cancer. CONCLUSIONS: DEGnext can classify DEGs into UR and DR genes from RNA-seq cancer datasets with high performance. This type of analysis, using biologically relevant fine-tuning data, may aid in the exploration of potential biomarkers and can be adapted for other disease datasets.


Subject(s)
Neoplasms , Neural Networks, Computer , Humans , Machine Learning , RNA-Seq , Support Vector Machine
3.
IEEE Trans Neural Netw Learn Syst ; 32(2): 604-624, 2021 02.
Article in English | MEDLINE | ID: mdl-32324570

ABSTRACT

Over the last several years, the field of natural language processing has been propelled forward by an explosion in the use of deep learning models. This article provides a brief introduction to the field and a quick overview of deep learning architectures and methods. It then sifts through the plethora of recent studies and summarizes a large assortment of relevant contributions. Analyzed research areas include several core linguistic processing issues in addition to many applications of computational linguistics. A discussion of the current state of the art is then provided along with recommendations for future research in the field.


Subject(s)
Deep Learning , Natural Language Processing , Computer Systems , Humans , Linguistics , Neural Networks, Computer , Surveys and Questionnaires
4.
Comput Biol Med ; 126: 104023, 2020 11.
Article in English | MEDLINE | ID: mdl-33049478

ABSTRACT

Many complex diseases occur due to genetic factors. A perturbation in the pathway of gene interactions leads to such disorders. Even though a group of genes is responsible, a few significant genes act as a biomarker for disease, perturbing the healthy network. Identifying such marker genes or a set of genes that play a pivotal role in diseases helps drug prioritization. We propose a scheme for finding potential bio-markers using a multi-layer consensus-driven approach. We reconstruct a functional module guided disease sub-network, followed by a multi-step consensus of network inference methods and shared ontological terms. We perform centrality analysis on the sub-networks under consideration and report hub genes as potentially key players in the target disease. To establish our scheme's effectiveness, we use Alzheimer's Disease (AD) and Breast Cancer as candidate diseases for experimentation. We evaluate the significance of prioritized genes based on reported evidence. We observe that BRCA1, BRCA2, and PTEN are the essential genes for Breast Cancer, whereas MAPK1, APP, and CASP7 are the essential genes playing an important role during AD.


Subject(s)
Alzheimer Disease , Gene Regulatory Networks , Alzheimer Disease/genetics , Biomarkers , Consensus , Gene Expression Profiling , Gene Regulatory Networks/genetics , Humans
5.
J Biosci ; 452020.
Article in English | MEDLINE | ID: mdl-32098912

ABSTRACT

A gene co-expression network (CEN) is of biological interest, since co-expressed genes share common functions and biological processes or pathways. Finding relationships among modules can reveal inter-modular preservation, and similarity in transcriptome, functional, and biological behaviors among modules of the same or two different datasets. There is no method which explores the one-to-one relationships and one-to-many relationships among modules extracted from control and disease samples based on both topological and semantic similarity using both microarray and RNA seq data. In this work, we propose a novel fusion measure to detect mapping between modules from two sets of co-expressed modules extracted from control and disease stages of Alzheimer's disease (AD) and Parkinson's disease (PD) datasets. Our measure considers both topological and biological information of a module and is an estimation of four parameters, namely, semantic similarity, eigengene correlation, degree difference, and the number of common genes. We analyze the consensus modules shared between both control and disease stages in terms of their association with diseases. We also validate the close associations between human and chimpanzee modules and compare with the state-ofthe- art method. Additionally, we propose two novel observations on the relationships between modules for further analysis.


Subject(s)
Gene Expression Regulation , Gene Regulatory Networks/physiology , Transcriptome , Algorithms , Alzheimer Disease/genetics , Alzheimer Disease/metabolism , Animals , Databases, Genetic , Humans , Pan troglodytes , Parkinson Disease/genetics , Parkinson Disease/metabolism
6.
Comput Biol Med ; 113: 103380, 2019 10.
Article in English | MEDLINE | ID: mdl-31415946

ABSTRACT

In the recent past, a number of methods have been developed for analysis of biological data. Among these methods, gene co-expression networks have the ability to mine functionally related genes with similar co-expression patterns, because of which such networks have been most widely used. However, gene co-expression networks cannot identify genes, which undergo condition specific changes in their relationships with other genes. In contrast, differential co-expression analysis enables finding co-expressed genes exhibiting significant changes across disease conditions. In this paper, we present some significant outcomes of a comparative study of four co-expression network module detection techniques, namely, THD-Module Extractor, DiffCoEx, MODA, and WGCNA, which can perform differential co-expression analysis on both gene and miRNA expression data (microarray and RNA-seq) and discuss the applications to Alzheimer's disease and Parkinson's disease research. Our observations reveal that compared to other methods, THD-Module Extractor is the most effective in finding modules with higher functional relevance and biological significance.


Subject(s)
Alzheimer Disease , Databases, Genetic , Gene Expression Profiling , Gene Regulatory Networks , Parkinson Disease , Transcriptome , Alzheimer Disease/genetics , Alzheimer Disease/metabolism , Biomarkers/metabolism , Humans , Parkinson Disease/genetics , Parkinson Disease/metabolism
7.
Comput Biol Chem ; 77: 373-389, 2018 Dec.
Article in English | MEDLINE | ID: mdl-30466046

ABSTRACT

Genes interact with each other and may cause perturbation in the molecular pathways leading to complex diseases. Often, instead of any single gene, a subset of genes interact, forming a network, to share common biological functions. Such a subnetwork is called a functional module or motif. Identifying such modules and central key genes in them, that may be responsible for a disease, may help design patient-specific drugs. In this study, we consider the neurodegenerative Alzheimer's Disease (AD) and identify potentially responsible genes from functional motif analysis. We start from the hypothesis that central genes in genetic modules are more relevant to a disease that is under investigation and identify hub genes from the modules as potential marker genes. Motifs or modules are often non-exclusive or overlapping in nature. Moreover, they sometimes show intrinsic or hierarchical distributions with overlapping functional roles. To the best of our knowledge, no prior work handles both the situations in an integrated way. We propose a non-exclusive clustering approach, CluViaN (Clustering Via Network) that can detect intrinsic as well as overlapping modules from gene co-expression networks constructed using microarray expression profiles. We compare our method with existing methods to evaluate the quality of modules extracted. CluViaN reports the presence of intrinsic and overlapping motifs in different species not reported by any other research. We further apply our method to extract significant AD specific modules using CluViaN and rank them based the number of genes from a module involved in the disease pathways. Finally, top central genes are identified by topological analysis of the modules. We use two different AD phenotype data for experimentation. We observe that central genes, namely PSEN1, APP, NDUFB2, NDUFA1, UQCR10, PPP3R1 and a few more, play significant roles in the AD. Interestingly, our experiments also find a hub gene, PML, which has recently been reported to play a role in plasticity, circadian rhythms and the response to proteins which can cause neurodegenerative disorders. MUC4, another hub gene that we find experimentally is yet to be investigated for its potential role in AD. A software implementation of CluViaN in Java is available for download at https://sites.google.com/site/swarupnehu/publications/resources/CluViaN Software.rar.


Subject(s)
Alzheimer Disease/genetics , Gene Regulatory Networks , Algorithms , Cluster Analysis , Gene Expression Regulation , Genomics/methods , Humans , Phenotype , Transcriptome
8.
Comput Biol Chem ; 75: 154-167, 2018 Aug.
Article in English | MEDLINE | ID: mdl-29787933

ABSTRACT

Developing a cost-effective and robust triclustering algorithm that can identify triclusters of high biological significance in the gene-sample-time (GST) domain is a challenging task. Most existing triclustering algorithms can detect shifting and scaling patterns in isolation, they are not able to handle co-occurring shifting-and-scaling patterns. This paper makes an attempt to address this issue. It introduces a robust triclustering algorithm called THD-Tricluster to identify triclusters over the GST domain. In addition to applying over several benchmark datasets for its validation, the proposed THD-Tricluster algorithm was applied on HIV-1 progression data to identify disease-specific genes. THD-Tricluster could identify 38 most responsible genes for the deadly disease which includes GATA3, EGR1, JUN, ELF1, AGFG1, AGFG2, CX3CR1, CXCL12, CCR5, CCR2, and many others. The results are validated using GeneCard and other established results.


Subject(s)
Algorithms , HIV-1/genetics , Cluster Analysis , HIV-1/isolation & purification , Humans , Oligonucleotide Array Sequence Analysis
9.
J Biosci ; 42(3): 383-396, 2017 Sep.
Article in English | MEDLINE | ID: mdl-29358552

ABSTRACT

Protein complexes are known to play a major role in controlling cellular activity in a living being. Identifying complexes from raw protein-protein interactions (PPIs) is an important area of research. Earlier work has been limited mostly to yeast and a few other model organisms. Such protein complex identification methods, when applied to large human PPIs often give poor performance. We introduce a novel method called ComFiR to detect such protein complexes and further rank diseased complexes based on a query disease. We have shown that it has better performance in identifying protein complexes from human PPI data. This method is evaluated in terms of positive predictive value, sensitivity and accuracy. We have introduced a ranking approach and showed its application on Alzheimer's disease.


Subject(s)
Algorithms , Alzheimer Disease/metabolism , Computational Biology/methods , Protein Interaction Mapping/statistics & numerical data , Alzheimer Disease/diagnosis , Alzheimer Disease/pathology , Databases, Protein , Humans , Protein Binding
10.
Methods Mol Biol ; 1375: 91-103, 2016.
Article in English | MEDLINE | ID: mdl-26350227

ABSTRACT

Mining microarray data to unearth interesting expression profile patterns for discovery of in silico biological knowledge is an emerging area of research in computational biology. A group of functionally related genes may have similar expression patterns under a set of conditions or at some time points. Biclustering is an important data mining tool that has been successfully used to analyze gene expression data for biologically significant cluster discovery. The purpose of this chapter is to introduce interesting patterns that may be observed in expression data and discuss the role of biclustering techniques in detecting interesting functional gene groups with similar expression patterns.


Subject(s)
Cluster Analysis , Computational Biology/methods , Gene Expression Profiling/methods , Animals , Data Mining/methods , Gene Expression Regulation , Humans , Oligonucleotide Array Sequence Analysis/methods , Reproducibility of Results
11.
Comput Biol Chem ; 59 Pt B: 32-41, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26362299

ABSTRACT

A number of methods have been proposed in the literature of protein-protein interaction (PPI) network analysis for detection of clusters in the network. Clusters are identified by these methods using various graph theoretic criteria. Most of these methods have been found time consuming due to involvement of preprocessing and post processing tasks. In addition, they do not achieve high precision and recall consistently and simultaneously. Moreover, the existing methods do not employ the idea of core-periphery structural pattern of protein complexes effectively to extract clusters. In this paper, we introduce a clustering method named CPCA based on a recent observation by researchers that a protein complex in a PPI network is arranged as a relatively dense core region and additional proteins weakly connected to the core. CPCA uses two connectivity criterion functions to identify core and peripheral regions of the cluster. To locate initial node of a cluster we introduce a measure called DNQ (Degree based Neighborhood Qualification) index that evaluates tendency of the node to be part of a cluster. CPCA performs well when compared with well-known counterparts. Along with protein complex gold standards, a co-localization dataset has also been used for validation of the results.


Subject(s)
Protein Interaction Maps , Proteins/chemistry , Cluster Analysis , Databases, Protein , Protein Binding , Protein Interaction Mapping , Reproducibility of Results
12.
BMC Bioinformatics ; 15 Suppl 7: S10, 2014.
Article in English | MEDLINE | ID: mdl-25079873

ABSTRACT

BACKGROUND: Biological networks connect genes, gene products to one another. A network of co-regulated genes may form gene clusters that can encode proteins and take part in common biological processes. A gene co-expression network describes inter-relationships among genes. Existing techniques generally depend on proximity measures based on global similarity to draw the relationship between genes. It has been observed that expression profiles are sharing local similarity rather than global similarity. We propose an expression pattern based method called GeCON to extract Gene CO-expression Network from microarray data. Pair-wise supports are computed for each pair of genes based on changing tendencies and regulation patterns of the gene expression. Gene pairs showing negative or positive co-regulation under a given number of conditions are used to construct such gene co-expression network. We construct co-expression network with signed edges to reflect up- and down-regulation between pairs of genes. Most existing techniques do not emphasize computational efficiency. We exploit a fast correlogram matrix based technique for capturing the support of each gene pair to construct the network. RESULTS: We apply GeCON to both real and synthetic gene expression data. We compare our results using the DREAM (Dialogue for Reverse Engineering Assessments and Methods) Challenge data with three well known algorithms, viz., ARACNE, CLR and MRNET. Our method outperforms other algorithms based on in silico regulatory network reconstruction. Experimental results show that GeCON can extract functionally enriched network modules from real expression data. CONCLUSIONS: In view of the results over several in-silico and real expression datasets, the proposed GeCON shows satisfactory performance in predicting co-expression network in a computationally inexpensive way. We further establish that a simple expression pattern matching is helpful in finding biologically relevant gene network. In future, we aim to introduce an enhanced GeCON to identify Protein-Protein interaction network complexes by incorporating variable density concept.


Subject(s)
Gene Expression Profiling , Gene Regulatory Networks , Oligonucleotide Array Sequence Analysis , Algorithms , Computer Simulation , Down-Regulation , Gene Expression , Gene Expression Profiling/methods , Humans , Models, Genetic , Oligonucleotide Array Sequence Analysis/methods
13.
BMC Bioinformatics ; 13 Suppl 13: S4, 2012.
Article in English | MEDLINE | ID: mdl-23320896

ABSTRACT

BACKGROUND: The development of high-throughput Microarray technologies has provided various opportunities to systematically characterize diverse types of computational biological networks. Co-expression network have become popular in the analysis of microarray data, such as for detecting functional gene modules. RESULTS: This paper presents a method to build a co-expression network (CEN) and to detect network modules from the built network. We use an effective gene expression similarity measure called NMRS (Normalized mean residue similarity) to construct the CEN. We have tested our method on five publicly available benchmark microarray datasets. The network modules extracted by our algorithm have been biologically validated in terms of Q value and p value. CONCLUSIONS: Our results show that the technique is capable of detecting biologically significant network modules from the co-expression network. Biologist can use this technique to find groups of genes with similar functionality based on their expression information.


Subject(s)
Computational Biology/methods , Data Interpretation, Statistical , Gene Expression Profiling/statistics & numerical data , Gene Regulatory Networks , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Algorithms , Databases, Genetic/statistics & numerical data , Gene Expression
SELECTION OF CITATIONS
SEARCH DETAIL
...