Pesquisa | Portal Regional da BVS

Identification of cancer risk groups through multi-omics integration using autoencoder and tensor analysis.

Braytee, Ali; He, Sam; Tang, Shuxian; Sun, Yuxuan; Jiang, Xiaoying; Yu, Xuanding; Khatri, Inder; Chaturvedi, Kunal; Prasad, Mukesh; Anaissi, Ali.

Sci Rep ; 14(1): 11263, 2024 05 17.

Artigo em Inglês | MEDLINE | ID: mdl-38760420

RESUMO

Identifying cancer risk groups by multi-omics has attracted researchers in their quest to find biomarkers from diverse risk-related omics. Stratifying the patients into cancer risk groups using genomics is essential for clinicians for pre-prevention treatment to improve the survival time for patients and identify the appropriate therapy strategies. This study proposes a multi-omics framework that can extract the features from various omics simultaneously. The framework employs autoencoders to learn the non-linear representation of the data and applies tensor analysis for feature learning. Further, the clustering method is used to stratify the patients into multiple cancer risk groups. Several omics were included in the experiments, namely methylation, somatic copy-number variation (SCNV), micro RNA (miRNA) and RNA sequencing (RNAseq) from two cancer types, including Glioma and Breast Invasive Carcinoma from the TCGA dataset. The results of this study are promising, as evidenced by the survival analysis and classification models, which outperformed the state-of-the-art. The patients can be significantly (p-value<0.05) divided into risk groups using extracted latent variables from the fused multi-omics data. The pipeline is open source to help researchers and clinicians identify the patients' risk groups using genomics.

Assuntos

Variações do Número de Cópias de DNA , Genômica , Humanos , Genômica/métodos , Metilação de DNA , Neoplasias/genética , MicroRNAs/genética , Feminino , Biomarcadores Tumorais/genética , Glioma/genética , Glioma/patologia , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Multiômica

Stochastic Dual Simplex Algorithm: A Novel Heuristic Optimization Algorithm.

Zandavi, Seid Miad; Chung, Vera Yuk Ying; Anaissi, Ali.

IEEE Trans Cybern ; 51(5): 2725-2734, 2021 May.

Artigo em Inglês | MEDLINE | ID: mdl-31425133

RESUMO

A new heuristic optimization algorithm is presented to solve the nonlinear optimization problems. The proposed algorithm utilizes a stochastic method to achieve the optimal point based on simplex techniques. A dual simplex is distributed stochastically in the search space to find the best optimal point. Simplexes share the best and worst vertices of one another to move better through search space. The proposed algorithm is applied to 25 well-known benchmarks, and its performance is compared with grey wolf optimizer (GWO), particle swarm optimization (PSO), Nelder-Mead simplex algorithm, hybrid GWO combined with pattern search (hGWO-PS), and hybrid GWO algorithm combined with random exploratory search algorithm (hGWO-RES). The numerical results show that the proposed algorithm, called stochastic dual simplex algorithm (SDSA), has a competitive performance in terms of accuracy and complexity.

A Tensor-Based Structural Damage Identification and Severity Assessment.

Anaissi, Ali; Makki Alamdari, Mehrisadat; Rakotoarivelo, Thierry; Khoa, Nguyen Lu Dang.

Sensors (Basel) ; 18(1)2018 Jan 02.

Artigo em Inglês | MEDLINE | ID: mdl-29301314

RESUMO

Early damage detection is critical for a large set of global ageing infrastructure. Structural Health Monitoring systems provide a sensor-based quantitative and objective approach to continuously monitor these structures, as opposed to traditional engineering visual inspection. Analysing these sensed data is one of the major Structural Health Monitoring (SHM) challenges. This paper presents a novel algorithm to detect and assess damage in structures such as bridges. This method applies tensor analysis for data fusion and feature extraction, and further uses one-class support vector machine on this feature to detect anomalies, i.e., structural damage. To evaluate this approach, we collected acceleration data from a sensor-based SHM system, which we deployed on a real bridge and on a laboratory specimen. The results show that our tensor method outperforms a state-of-the-art approach using the wavelet energy spectrum of the measured data. In the specimen case, our approach succeeded in detecting 92.5% of induced damage cases, as opposed to 61.1% for the wavelet-based approach. While our method was applied to bridges, its algorithm and computation can be used on other structures or sensor-data analysis problems, which involve large series of correlated data from multiple sensors.

Ensemble Feature Learning of Genomic Data Using Support Vector Machine.

Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R; Braytee, Ali; Kennedy, Paul J.

PLoS One ; 11(6): e0157330, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27304923

RESUMO

The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data.

Assuntos

Algoritmos , Perfilação da Expressão Gênica/estatística & dados numéricos , Genômica/estatística & dados numéricos , Máquina de Vetores de Suporte , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Criança , Neoplasias do Colo/genética , Neoplasias do Colo/patologia , Biologia Computacional/métodos , Mineração de Dados/métodos , Feminino , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Genômica/métodos , Humanos , Disseminação de Informação/métodos , Leucemia/genética , Leucemia/patologia , Reprodutibilidade dos Testes

Case-based retrieval framework for gene expression data.

Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R; Braytee, Ali; Kennedy, Paul J.

Cancer Inform ; 14: 21-31, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25861214

RESUMO

BACKGROUND: The process of retrieving similar cases in a case-based reasoning system is considered a big challenge for gene expression data sets. The huge number of gene expression values generated by microarray technology leads to complex data sets and similarity measures for high-dimensional data are problematic. Hence, gene expression similarity measurements require numerous machine-learning and data-mining techniques, such as feature selection and dimensionality reduction, to be incorporated into the retrieval process. METHODS: This article proposes a case-based retrieval framework that uses a k-nearest-neighbor classifier with a weighted-feature-based similarity to retrieve previously treated patients based on their gene expression profiles. RESULTS: The herein-proposed methodology is validated on several data sets: a childhood leukemia data set collected from The Children's Hospital at Westmead, as well as the Colon cancer, the National Cancer Institute (NCI), and the Prostate cancer data sets. Results obtained by the proposed framework in retrieving patients of the data sets who are similar to new patients are as follows: 96% accuracy on the childhood leukemia data set, 95% on the NCI data set, 93% on the Colon cancer data set, and 98% on the Prostate cancer data set. CONCLUSION: The designed case-based retrieval framework is an appropriate choice for retrieving previous patients who are similar to a new patient, on the basis of their gene expression data, for better diagnosis and treatment of childhood leukemia. Moreover, this framework can be applied to other gene expression data sets using some or all of its steps.

A balanced iterative random forest for gene selection from microarray data.

Anaissi, Ali; Kennedy, Paul J; Goyal, Madhu; Catchpoole, Daniel R.

BMC Bioinformatics ; 14: 261, 2013 Aug 27.

Artigo em Inglês | MEDLINE | ID: mdl-23981907

RESUMO

BACKGROUND: The wealth of gene expression values being generated by high throughput microarray technologies leads to complex high dimensional datasets. Moreover, many cohorts have the problem of imbalanced classes where the number of patients belonging to each class is not the same. With this kind of dataset, biologists need to identify a small number of informative genes that can be used as biomarkers for a disease. RESULTS: This paper introduces a Balanced Iterative Random Forest (BIRF) algorithm to select the most relevant genes for a disease from imbalanced high-throughput gene expression microarray data. Balanced iterative random forest is applied on four cancer microarray datasets: a childhood leukaemia dataset, which represents the main target of this paper, collected from The Children's Hospital at Westmead, NCI 60, a Colon dataset and a Lung cancer dataset. The results obtained by BIRF are compared to those of Support Vector Machine-Recursive Feature Elimination (SVM-RFE), Multi-class SVM-RFE (MSVM-RFE), Random Forest (RF) and Naive Bayes (NB) classifiers. The results of the BIRF approach outperform these state-of-the-art methods, especially in the case of imbalanced datasets. Experiments on the childhood leukaemia dataset show that a 7% â¼ 12% better accuracy is achieved by BIRF over MSVM-RFE with the ability to predict patients in the minor class. The informative biomarkers selected by the BIRF algorithm were validated by repeating training experiments three times to see whether they are globally informative, or just selected by chance. The results show that 64% of the top genes consistently appear in the three lists, and the top 20 genes remain near the top in the other three lists. CONCLUSION: The designed BIRF algorithm is an appropriate choice to select genes from imbalanced high-throughput gene expression microarray data. BIRF outperforms the state-of-the-art methods, especially the ability to handle the class-imbalanced data. Moreover, the analysis of the selected genes also provides a way to distinguish between the predictive genes and those that only appear to be predictive.

Assuntos

Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Marcadores Genéticos/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Teorema de Bayes , Criança , Feminino , Humanos , Modelos Genéticos , Neoplasias/genética , Neoplasias/metabolismo , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA