Pesquisa | Portal Regional da BVS (teste)

The Weight-Based Feature Selection (WBFS) Algorithm Classifies Lung Cancer Subtypes Using Proteomic Data.

Wang, Yangyang; Gao, Xiaoguang; Ru, Xinxin; Sun, Pengzhan; Wang, Jihan.

Entropy (Basel) ; 25(7)2023 Jun 29.

Artigo em Inglês | MEDLINE | ID: mdl-37509950

RESUMO

Feature selection plays an important role in improving the performance of classification or reducing the dimensionality of high-dimensional datasets, such as high-throughput genomics/proteomics data in bioinformatics. As a popular approach with computational efficiency and scalability, information theory has been widely incorporated into feature selection. In this study, we propose a unique weight-based feature selection (WBFS) algorithm that assesses selected features and candidate features to identify the key protein biomarkers for classifying lung cancer subtypes from The Cancer Proteome Atlas (TCPA) database and we further explored the survival analysis between selected biomarkers and subtypes of lung cancer. Results show good performance of the combination of our WBFS method and Bayesian network for mining potential biomarkers. These candidate signatures have valuable biological significance in tumor classification and patient survival analysis. Taken together, this study proposes the WBFS method that helps to explore candidate biomarkers from biomedical datasets and provides useful information for tumor diagnosis or therapy strategies.

Using feature selection and Bayesian network identify cancer subtypes based on proteomic data.

Wang, Yangyang; Gao, Xiaoguang; Ru, Xinxin; Sun, Pengzhan; Wang, Jihan.

J Proteomics ; 280: 104895, 2023 05 30.

Artigo em Inglês | MEDLINE | ID: mdl-37024076

RESUMO

The Cancer Proteome Atlas (TCPA) project collects reverse-phase protein arrays (RPPA)-based proteome datasets from nearly 8000 samples across 32 cancer types. This study aims to investigate the pan-cancer proteome signature and identify cancer subtypes of glioma, kidney cancer, and lung cancer based on TCPA data. We first visualized the tumor clustering models using t-distributed stochastic neighbour embedding (t-SNE) and bi-clustering heatmap. Then, three feature selection methods (pyHSICLasso, XGBoost, and Random Forest) were performed to select protein features for classifying cancer subtypes in training dataset, and the LibSVM algorithm was empolyed to test classification accuracy in the validation dataset. Clustering analysis revealed that different kinds of tumors have relatively distinct proteomic profiling based on tissue or origin. We identified 20, 10, and 20 protein features with the highest accuracies in classifying subtypes of glioma, kidney cancer, and lung cancer, respectively. The predictive abilities of the selected proteins were confirmed by receiving operating characteristic (ROC) analysis. Finally, the Bayesian network was utilized to explore the protein biomarkers that have direct causal relationships with cancer subtypes. Overall, we highlight the theoretical and technical applications of machine learning based feature selection approaches in the analysis of high-throughput biological data, particularly for cancer biomarker research. SIGNIFICANCE: Functional proteomics is a powerful approach for characterizing cell signaling pathways and understanding their phenotypic effects on cancer development. The TCPA database provides a platform to explore and analyze TCGA pan-cancer RPPA-based protein expression. With the advent of the RPPA technology, the availability of high-throughput data in TCPA platform has made it possible to use machine learning methods to identify protein biomarkers and further differentiate subtypes of cancer based on proteomic data. In this study, we highlight the role of feature selection and Bayesian network in discovery protein biomarker for classifying cancer subtypes based on functional proteomic data. The application of machine learning methods in the analysis of high-throughput biological data, particularly for cancer biomarker researches, which have potential clinical values in developing individualized treatment strategies.

Assuntos

Carcinoma de Células Renais , Glioma , Neoplasias Renais , Neoplasias Pulmonares , Humanos , Proteômica/métodos , Proteoma/metabolismo , Teorema de Bayes , Biomarcadores Tumorais/metabolismo

Identification of gene signatures for COAD using feature selection and Bayesian network approaches.

Wang, Yangyang; Gao, Xiaoguang; Ru, Xinxin; Sun, Pengzhan; Wang, Jihan.

Sci Rep ; 12(1): 8761, 2022 05 24.

Artigo em Inglês | MEDLINE | ID: mdl-35610288

RESUMO

The combination of TCGA and GTEx databases will provide more comprehensive information for characterizing the human genome in health and disease, especially for underlying the cancer genetic alterations. Here we analyzed the gene expression profile of COAD in both tumor samples from TCGA and normal colon tissues from GTEx. Using the SNR-PPFS feature selection algorithms, we discovered a 38 gene signatures that performed well in distinguishing COAD tumors from normal samples. Bayesian network of the 38 genes revealed that DEGs with similar expression patterns or functions interacted more closely. We identified 14 up-DEGs that were significantly correlated with tumor stages. Cox regression analysis demonstrated that tumor stage, STMN4 and FAM135B dysregulation were independent prognostic factors for COAD survival outcomes. Overall, this study indicates that using feature selection approaches to select key gene signatures from high-dimensional datasets can be an effective way for studying cancer genomic characteristics.

Assuntos

Adenocarcinoma , Neoplasias do Colo , Adenocarcinoma/patologia , Teorema de Bayes , Neoplasias do Colo/genética , Neoplasias do Colo/patologia , Regulação Neoplásica da Expressão Gênica , Humanos , Prognóstico

A hybrid feature selection algorithm and its application in bioinformatics.

Wang, Yangyang; Gao, Xiaoguang; Ru, Xinxin; Sun, Pengzhan; Wang, Jihan.

PeerJ Comput Sci ; 8: e933, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35494789

RESUMO

Feature selection is an independent technology for high-dimensional datasets that has been widely applied in a variety of fields. With the vast expansion of information, such as bioinformatics data, there has been an urgent need to investigate more effective and accurate methods involving feature selection in recent decades. Here, we proposed the hybrid MMPSO method, by combining the feature ranking method and the heuristic search method, to obtain an optimal subset that can be used for higher classification accuracy. In this study, ten datasets obtained from the UCI Machine Learning Repository were analyzed to demonstrate the superiority of our method. The MMPSO algorithm outperformed other algorithms in terms of classification accuracy while utilizing the same number of features. Then we applied the method to a biological dataset containing gene expression information about liver hepatocellular carcinoma (LIHC) samples obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx). On the basis of the MMPSO algorithm, we identified a 18-gene signature that performed well in distinguishing normal samples from tumours. Nine of the 18 differentially expressed genes were significantly up-regulated in LIHC tumour samples, and the area under curves (AUC) of the combination seven genes (ADRA2B, ERAP2, NPC1L1, PLVAP, POMC, PYROXD2, TRIM29) in classifying tumours with normal samples was greater than 0.99. Six genes (ADRA2B, PYROXD2, CACHD1, FKBP1B, PRKD1 and RPL7AP6) were significantly correlated with survival time. The MMPSO algorithm can be used to effectively extract features from a high-dimensional dataset, which will provide new clues for identifying biomarkers or therapeutic targets from biological data and more perspectives in tumor research.

Improved Local Search with Momentum for Bayesian Networks Structure Learning.

Liu, Xiaohan; Gao, Xiaoguang; Wang, Zidong; Ru, Xinxin.

Entropy (Basel) ; 23(6)2021 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-34203696

RESUMO

Bayesian Networks structure learning (BNSL) is a troublesome problem that aims to search for an optimal structure. An exact search tends to sacrifice a significant amount of time and memory to promote accuracy, while the local search can tackle complex networks with thousands of variables but commonly gets stuck in a local optimum. In this paper, two novel and practical operators and a derived operator are proposed to perturb structures and maintain the acyclicity. Then, we design a framework, incorporating an influential perturbation factor integrated by three proposed operators, to escape current local optimal and improve the dilemma that outcomes trap in local optimal. The experimental results illustrate that our algorithm can output competitive results compared with the state-of-the-art constraint-based method in most cases. Meanwhile, our algorithm reaches an equivalent or better solution found by the state-of-the-art exact search and hybrid methods.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA