Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters











Database
Language
Publication year range
1.
PLoS One ; 14(2): e0210786, 2019.
Article in English | MEDLINE | ID: mdl-30763332

ABSTRACT

For studying cancer and genetic diseases, the issue of identifying high correlation genes from high-dimensional data is an important problem. It is a great challenge to select relevant biomarkers from gene expression data that contains some important correlation structures, and some of the genes can be divided into different groups with a common biological function, chromosomal location or regulation. In this paper, we propose a penalized accelerated failure time model CHR-DE using a non-convex regularization (local search) with differential evolution (global search) in a wrapper-embedded memetic framework. The complex harmonic regularization (CHR) can approximate to the combination [Formula: see text] and ℓq (1 ≤ q < 2) for selecting biomarkers in group. And differential evolution (DE) is utilized to globally optimize the CHR's hyperparameters, which make CHR-DE achieve strong capability of selecting groups of genes in high-dimensional biological data. We also developed an efficient path seeking algorithm to optimize this penalized model. The proposed method is evaluated on synthetic and three gene expression datasets: breast cancer, hepatocellular carcinoma and colorectal cancer. The experimental results demonstrate that CHR-DE is a more effective tool for feature selection and learning prediction.


Subject(s)
Algorithms , Biomarkers, Tumor , Carcinoma, Hepatocellular , Colorectal Neoplasms , Liver Neoplasms , Models, Biological , Biomarkers, Tumor/biosynthesis , Biomarkers, Tumor/genetics , Carcinoma, Hepatocellular/genetics , Carcinoma, Hepatocellular/metabolism , Carcinoma, Hepatocellular/pathology , Colorectal Neoplasms/genetics , Colorectal Neoplasms/metabolism , Colorectal Neoplasms/pathology , Databases, Nucleic Acid , Gene Expression Regulation, Neoplastic , Humans , Liver Neoplasms/genetics , Liver Neoplasms/metabolism , Liver Neoplasms/pathology
2.
Sci Rep ; 8(1): 13009, 2018 08 29.
Article in English | MEDLINE | ID: mdl-30158596

ABSTRACT

Traditional supervised learning classifier needs a lot of labeled samples to achieve good performance, however in many biological datasets there is only a small size of labeled samples and the remaining samples are unlabeled. Labeling these unlabeled samples manually is difficult or expensive. Technologies such as active learning and semi-supervised learning have been proposed to utilize the unlabeled samples for improving the model performance. However in active learning the model suffers from being short-sighted or biased and some manual workload is still needed. The semi-supervised learning methods are easy to be affected by the noisy samples. In this paper we propose a novel logistic regression model based on complementarity of active learning and semi-supervised learning, for utilizing the unlabeled samples with least cost to improve the disease classification accuracy. In addition to that, an update pseudo-labeled samples mechanism is designed to reduce the false pseudo-labeled samples. The experiment results show that this new model can achieve better performances compared the widely used semi-supervised learning and active learning methods in disease classification and gene selection.


Subject(s)
Disease/classification , Disease/genetics , Logistic Models , Machine Learning , Supervised Machine Learning , Humans
SELECTION OF CITATIONS
SEARCH DETAIL