Search | VHL Regional Portal

Methodology to identify a gene expression signature by merging microarray datasets.

Fajarda, Olga; Almeida, João Rafael; Duarte-Pereira, Sara; Silva, Raquel M; Oliveira, José Luís.

Comput Biol Med ; 159: 106867, 2023 06.

Article in English | MEDLINE | ID: mdl-37060770

ABSTRACT

A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.

Subject(s)

Autism Spectrum Disorder , Gene Expression Profiling , Humans , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Transcriptome , Algorithms

NAPRT Expression Regulation Mechanisms: Novel Functions Predicted by a Bioinformatics Approach.

Duarte-Pereira, Sara; Fajarda, Olga; Matos, Sérgio; Luís Oliveira, José; Silva, Raquel Monteiro.

Genes (Basel) ; 12(12)2021 12 20.

Article in English | MEDLINE | ID: mdl-34946971

ABSTRACT

The nicotinate phosphoribosyltransferase (NAPRT) gene has gained relevance in the research of cancer therapeutic strategies due to its main role as a NAD biosynthetic enzyme. NAD metabolism is an attractive target for the development of anti-cancer therapies, given the high energy requirements of proliferating cancer cells and NAD-dependent signaling. A few studies have shown that NAPRT expression varies in different cancer types, making it imperative to assess NAPRT expression and functionality status prior to the application of therapeutic strategies targeting NAD. In addition, the recent finding of NAPRT extracellular form (eNAPRT) suggested the involvement of NAPRT in inflammation and signaling. However, the mechanisms regulating NAPRT gene expression have never been thoroughly addressed. In this study, we searched for NAPRT gene expression regulatory mechanisms in transcription factors (TFs), RNA binding proteins (RBPs) and microRNA (miRNAs) databases. We identified several potential regulators of NAPRT transcription activation, downregulation and alternative splicing and performed GO and expression analyses. The results of the functional analysis of TFs, RBPs and miRNAs suggest new, unexpected functions for the NAPRT gene in cell differentiation, development and neuronal biology.

Subject(s)

Computational Biology/methods , Pentosyltransferases/genetics , Pentosyltransferases/metabolism , Alternative Splicing , Cell Differentiation , Cell Line, Tumor , Databases, Genetic , Humans , Transcriptional Activation

Merging microarray studies to identify a common gene expression signature to several structural heart diseases.

Fajarda, Olga; Duarte-Pereira, Sara; Silva, Raquel M; Oliveira, José Luís.

BioData Min ; 13: 8, 2020.

Article in English | MEDLINE | ID: mdl-32670412

ABSTRACT

BACKGROUND: Heart disease is the leading cause of death worldwide. Knowing a gene expression signature in heart disease can lead to the development of more efficient diagnosis and treatments that may prevent premature deaths. A large amount of microarray data is available in public repositories and can be used to identify differentially expressed genes. However, most of the microarray datasets are composed of a reduced number of samples and to obtain more reliable results, several datasets have to be merged, which is a challenging task. The identification of differentially expressed genes is commonly done using statistical methods. Nonetheless, these methods are based on the definition of an arbitrary threshold to select the differentially expressed genes and there is no consensus on the values that should be used. RESULTS: Nine publicly available microarray datasets from studies of different heart diseases were merged to form a dataset composed of 689 samples and 8354 features. Subsequently, the adjusted p-value and fold change were determined and by combining a set of adjusted p-values cutoffs with a list of different fold change thresholds, 12 sets of differentially expressed genes were obtained. To select the set of differentially expressed genes that has the best accuracy in classifying samples from patients with heart diseases and samples from patients with no heart condition, the random forest algorithm was used. A set of 62 differentially expressed genes having a classification accuracy of approximately 95% was identified. CONCLUSIONS: We identified a gene expression signature common to different cardiac diseases and supported our findings by showing their involvement in the pathophysiology of the heart. The approach used in this study is suitable for the identification of gene expression signatures, and can be extended to different diseases.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL