Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
1.
Int J Data Min Bioinform ; 7(1): 22-37, 2013.
Article in English | MEDLINE | ID: mdl-23437513

ABSTRACT

Feature ranking, which ranks features via their individual importance, is one of the frequently used feature selection techniques. Traditional feature ranking criteria are apt to produce inconsistent ranking results even with light perturbations in training samples when applied to high dimensional and small-sized gene expression data, which brings troubles for further studies such as biomarker identification. A widely used strategy for solving the inconsistencies is the multicriterion combination, where score normalisation is crucial. In this paper, three problems in existing methods are first analyzed, and then a new feature importance transformation algorithm based on resampling and permutation is proposed for score normalisation. Experimental studies on four popular gene expression data sets show that the multi-criterion combination based on the proposed score normalisation produces gene rankings with improved robustness.


Subject(s)
Algorithms , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Pattern Recognition, Automated
2.
Article in English | MEDLINE | ID: mdl-21566255

ABSTRACT

Feature selection often aims to select a compact feature subset to build a pattern classifier with reduced complexity, so as to achieve improved classification performance. From the perspective of pattern analysis, producing stable or robust solution is also a desired property of a feature selection algorithm. However, the issue of robustness is often overlooked in feature selection. In this study, we analyze the robustness issue existing in feature selection for high-dimensional and small-sized gene-expression data, and propose to improve robustness of feature selection algorithm by using multiple feature selection evaluation criteria. Based on this idea, a multicriterion fusion-based recursive feature elimination (MCF-RFE) algorithm is developed with the goal of improving both classification performance and stability of feature selection results. Experimental studies on five gene-expression data sets show that the MCF-RFE algorithm outperforms the commonly used benchmark feature selection algorithm SVM-RFE.


Subject(s)
Algorithms , Computational Biology/methods , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis , Pattern Recognition, Automated/methods , Artificial Intelligence , Databases, Genetic , Humans , Neoplasms/genetics , Neoplasms/metabolism
3.
Article in English | MEDLINE | ID: mdl-20479500

ABSTRACT

Mahalanobis class separability measure provides an effective evaluation of the discriminative power of a feature subset, and is widely used in feature selection. However, this measure is computationally intensive or even prohibitive when it is applied to gene expression data. In this study, a recursive approach to Mahalanobis measure evaluation is proposed, with the goal of reducing computational overhead. Instead of evaluating Mahalanobis measure directly in high-dimensional space, the recursive approach evaluates the measure through successive evaluations in 2D space. Because of its recursive nature, this approach is extremely efficient when it is combined with a forward search procedure. In addition, it is noted that gene subsets selected by Mahalanobis measure tend to overfit training data and generalize unsatisfactorily on unseen test data, due to small sample size in gene expression problems. To alleviate the overfitting problem, a regularized recursive Mahalanobis measure is proposed in this study, and guidelines on determination of regularization parameters are provided. Experimental studies on five gene expression problems show that the regularized recursive Mahalanobis measure substantially outperforms the nonregularized Mahalanobis measures and the benchmark recursive feature elimination (RFE) algorithm in all five problems.


Subject(s)
Algorithms , Computational Biology/methods , Data Mining/methods , Gene Expression Profiling/methods , Data Interpretation, Statistical , Databases, Genetic , Humans , Neoplasms/genetics , Oligonucleotide Array Sequence Analysis , Regression Analysis
4.
Int J Neural Syst ; 16(5): 341-52, 2006 Oct.
Article in English | MEDLINE | ID: mdl-17117495

ABSTRACT

Microarray data contains a large number of genes (usually more than 1000) and a relatively small number of samples (usually fewer than 100). This presents problems to discriminant analysis of microarray data. One way to alleviate the problem is to reduce dimensionality of data by selecting important genes to the discriminant problem. Gene selection can be cast as a feature selection problem in the context of pattern classification. Feature selection approaches are broadly grouped into filter methods and wrapper methods. The wrapper method outperforms the filter method but at the cost of more intensive computation. In the present study, we proposed a wrapper-like gene selection algorithm based on the Regularization Network. Compared with classical wrapper method, the computational costs in our gene selection algorithm is significantly reduced, because the evaluation criterion we proposed does not demand repeated training in the leave-one-out procedure.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Genomics/methods , Molecular Biology/methods , Oligonucleotide Array Sequence Analysis/methods , Animals , DNA, Complementary/analysis , DNA, Complementary/genetics , Gene Expression Regulation, Neoplastic/genetics , Humans , Neoplasms/genetics
5.
Bioinformatics ; 22(20): 2507-15, 2006 Oct 15.
Article in English | MEDLINE | ID: mdl-16908500

ABSTRACT

MOTIVATION: Feature selection approaches, such as filter and wrapper, have been applied to address the gene selection problem in the literature of microarray data analysis. In wrapper methods, the classification error is usually used as the evaluation criterion of feature subsets. Due to the nature of high dimensionality and small sample size of microarray data, however, counting-based error estimation may not necessarily be an ideal criterion for gene selection problem. RESULTS: Our study reveals that evaluating genes in terms of counting-based error estimators such as resubstitution error, leave-one-out error, cross-validation error and bootstrap error may encounter severe ties problem, i.e. two or more gene subsets score equally, and this in turn results in uncertainty in gene selection. Our analysis finds that the ties problem is caused by the discrete nature of counting-based error estimators and could be avoided by using continuous evaluation criteria instead. Experiment results show that continuous evaluation criteria such as generalised the absolute value of w2 measure for support vector machines and modified Relief's measure for k-nearest neighbors produce improved gene selection compared with counting-based error estimators. AVAILABILITY: The companion website is at http://www.ntu.edu.sg/home5/pg02776030/wrappers/ The website contains (1) the source code of all the gene selection algorithms and (2) the complete set of tables and figures of experiments.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Computer Simulation , Databases, Genetic , Information Storage and Retrieval/methods , Models, Genetic , Models, Statistical , Reproducibility of Results , Sensitivity and Specificity
6.
IEEE Trans Biomed Eng ; 53(6): 1153-63, 2006 Jun.
Article in English | MEDLINE | ID: mdl-16761842

ABSTRACT

In this paper, we present two new algorithms for cell image segmentation. First, we demonstrate that pixel classification-based color image segmentation in color space is equivalent to performing segmentation on grayscale image through thresholding. Based on this result, we develop a supervised learning-based two-step procedure for color cell image segmentation, where color image is first mapped to grayscale via a transform learned through supervised learning, thresholding is then performed on the grayscale image to segment objects out of background. Experimental results show that the supervised learning-based two-step procedure achieved a boundary disagreement (mean absolute distance) of 0.85 while the disagreement produced by the pixel classification-based color image segmentation method is 3.59. Second, we develop a new marker detection algorithm for watershed-based separation of overlapping or touching cells. The merit of the new algorithm is that it employs both photometric and shape information and combines the two naturally in the framework of pattern classification to provide more reliable markers. Extensive experiments show that the new marker detection algorithm achieved 0.4% and 0.2% over-segmentation and under-segmentation, respectively, while reconstruction-based method produced 4.4% and 1.1% over-segmentation and under-segmentation, respectively.


Subject(s)
Biomarkers, Tumor/analysis , Image Interpretation, Computer-Assisted/methods , Immunohistochemistry/methods , Neoplasm Proteins/analysis , Tumor Suppressor Protein p53/analysis , Urinary Bladder Neoplasms/diagnosis , Urinary Bladder Neoplasms/metabolism , Algorithms , Artificial Intelligence , Humans , Image Enhancement/methods , Microscopy/methods , Pattern Recognition, Automated/methods , Reproducibility of Results , Sensitivity and Specificity , Tumor Cells, Cultured
7.
IEEE Trans Neural Netw ; 16(6): 1531-40, 2005 Nov.
Article in English | MEDLINE | ID: mdl-16342493

ABSTRACT

The central problem in training a radial basis function neural network is the selection of hidden layer neurons. In this paper, we propose to select hidden layer neurons based on data structure preserving criterion. Data structure denotes relative location of samples in the high-dimensional space. By preserving the data structure of samples including those that are close to separation boundaries between different classes, the neuron subset selected retains the separation margin underlying the full set of hidden layer neurons. As a direct result, the network obtained tends to generalize well.


Subject(s)
Algorithms , Databases, Factual , Information Storage and Retrieval/methods , Models, Theoretical , Neural Networks, Computer , Pattern Recognition, Automated/methods , Cluster Analysis , Computer Simulation
8.
IEEE Trans Neural Netw ; 16(6): 1651-63, 2005 Nov.
Article in English | MEDLINE | ID: mdl-16342504

ABSTRACT

Support vector machines (SVMs) have been extensively used. However, it is known that SVMs face difficulty in solving large complex problems due to the intensive computation involved in their training algorithms, which are at least quadratic with respect to the number of training examples. This paper proposes a new, simple, and efficient network architecture which consists of several SVMs each trained on a small subregion of the whole data sampling space and the same number of simple neural quantizer modules which inhibit the outputs of all the remote SVMs and only allow a single local SVM to fire (produce actual output) at any time. In principle, this region-computing based modular network method can significantly reduce the learning time of SVM algorithms without sacrificing much generalization performance. The experiments on a few real large complex benchmark problems demonstrate that our method can be significantly faster than single SVMs without losing much generalization performance.


Subject(s)
Algorithms , Models, Theoretical , Neural Networks, Computer , Pattern Recognition, Automated/methods , Computer Simulation
9.
IEEE Trans Syst Man Cybern B Cybern ; 35(2): 339-44, 2005 Apr.
Article in English | MEDLINE | ID: mdl-15828661

ABSTRACT

Principal components analysis (PCA) is probably the best-known approach to unsupervised dimensionality reduction. However, axes of the lower-dimensional space, ie., principal components (PCs), are a set of new variables carrying no clear physical meanings. Thus, interpretation of results obtained in the lower-dimensional PCA space and data acquisition for test samples still involve all of the original measurements. To deal with this problem, we develop two algorithms to link the physically meaningless PCs back to a subset of original measurements. The main idea of the algorithms is to evaluate and select feature subsets based on their capacities to reproduce sample projections on principal axes. The strength of the new algorithms is that the computaion complexity involved is significantly reduced, compared with the data structural similarity-based feature evaluation.


Subject(s)
Algorithms , Artificial Intelligence , Models, Statistical , Pattern Recognition, Automated/methods , Principal Component Analysis , Cluster Analysis , Computer Simulation
10.
Conf Proc IEEE Eng Med Biol Soc ; 2005: 6484-7, 2005.
Article in English | MEDLINE | ID: mdl-17281754

ABSTRACT

Cell nuclei segmentation is a critical issue in automatic cell analysis for cancer diagnosis and prognosis. Marker-controlled watershed segmentation algorithm is used the most commonly. In this paper, adaptive successive erosion-based (ASE) marker extraction method for watershed algorithm is presented, with the goal of extracting markers labelling each individual nucleus, including overlapping cell nuclei. Based on the new marker detection method, an integrated cell image segmentation algorithm is developed for p53 immunohistochemistry in bladder inverted papilloma. Experiments were performed on a number of images, and results demonstrate that the algorithm produces more accurate segmentation than other methods.

11.
Bioinformatics ; 21(8): 1559-64, 2005 Apr 15.
Article in English | MEDLINE | ID: mdl-15598834

ABSTRACT

MOTIVATION: One problem with discriminant analysis of DNA microarray data is that each sample is represented by quite a large number of genes, and many of them are irrelevant, insignificant or redundant to the discriminant problem at hand. Methods for selecting important genes are, therefore, of much significance in microarray data analysis. In the present study, a new criterion, called LS Bound measure, is proposed to address the gene selection problem. The LS Bound measure is derived from leave-one-out procedure of LS-SVMs (least squares support vector machines), and as the upper bound for leave-one-out classification results it reflects to some extent the generalization performance of gene subsets. RESULTS: We applied this LS Bound measure for gene selection on two benchmark microarray datasets: colon cancer and leukemia. We also compared the LS Bound measure with other evaluation criteria, including the well-known Fisher's ratio and Mahalanobis class separability measure, and other published gene selection algorithms, including Weighting factor and SVM Recursive Feature Elimination. The strength of the LS Bound measure is that it provides gene subsets leading to more accurate classification results than the filter method while its computational complexity is at the level of the filter method. AVAILABILITY: A companion website can be accessed at http://www.ntu.edu.sg/home5/pg02776030/lsbound/. The website contains: (1) the source code of the gene selection algorithm; (2) the complete set of tables and figures regarding the experimental study; (3) proof of the inequality (9). CONTACT: ekzmao@ntu.edu.sg.


Subject(s)
Gene Expression Profiling/methods , Models, Genetic , Neoplasm Proteins/genetics , Neoplasm Proteins/metabolism , Neoplasms/genetics , Neoplasms/metabolism , Oligonucleotide Array Sequence Analysis/methods , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Discriminant Analysis , Gene Expression Regulation, Neoplastic , Humans , Models, Statistical
12.
IEEE Trans Syst Man Cybern B Cybern ; 34(1): 60-7, 2004 Feb.
Article in English | MEDLINE | ID: mdl-15369051

ABSTRACT

In many pattern classification applications, data are represented by high dimensional feature vectors, which induce high computational cost and reduce classification speed in the context of support vector machines (SVMs). To reduce the dimensionality of pattern representation, we develop a discriminative function pruning analysis (DFPA) feature subset selection method in the present study. The basic idea of the DFPA method is to learn the SVM discriminative function from training data using all input variables available first, and then to select feature subset through pruning analysis. In the present study, the pruning is implement using a forward selection procedure combined with a linear least square estimation algorithm, taking advantage of linear-in-the-parameter structure of the SVM discriminative function. The strength of the DFPA method is that it combines good characters of both filter and wrapper methods. Firstly, it retains the simplicity of the filter method avoiding training of a large number of SVM classifier. Secondly, it inherits the good performance of the wrapper method by taking the SVM classification algorithm into account.

14.
IEEE Trans Neural Netw ; 13(5): 1211-7, 2002.
Article in English | MEDLINE | ID: mdl-18244518

ABSTRACT

For classification applications, the role of hidden layer neurons of a radial basis function (RBF) neural network can be interpreted as a function which maps input patterns from a nonlinear separable space to a linear separable space. In the new space, the responses of the hidden layer neurons form new feature vectors. The discriminative power is then determined by RBF centers. In the present study, we propose to choose RBF centers based on Fisher ratio class separability measure with the objective of achieving maximum discriminative power. We implement this idea using a multistep procedure that combines Fisher ratio, an orthogonal transform, and a forward selection search method. Our motivation of employing the orthogonal transform is to decouple the correlations among the responses of the hidden layer neurons so that the class separability provided by individual RBF neurons can be evaluated independently. The strengths of our method are double fold. First, our method selects a parsimonious network architecture. Second, this method selects centers that provide large class separation.

15.
IEEE Trans Neural Netw ; 13(5): 1218-24, 2002.
Article in English | MEDLINE | ID: mdl-18244519

ABSTRACT

Feature selection is an important issue in pattern classification. In the presented study, we develop a fast orthogonal forward selection (FOFS) algorithm for feature subset selection. The FOFS algorithm employs an orthogonal transform to decompose correlations among candidate features, but it performs the orthogonal decomposition in an implicit way. Consequently, the fast algorithm demands less computational effort as compared with conventional orthogonal forward selection (OFS).

16.
IEEE Trans Neural Netw ; 11(4): 1009-16, 2000.
Article in English | MEDLINE | ID: mdl-18249828

ABSTRACT

Network structure determination is an important issue in pattern classification based on a probabilistic neural network. In this study, a supervised network structure determination algorithm is proposed. The proposed algorithm consists of two parts and runs in an iterative way. The first part identifies an appropriate smoothing parameter using a genetic algorithm, while the second part determines suitable pattern layer neurons using a forward regression orthogonal algorithm. The proposed algorithm is capable of offering a fairly small network structure with satisfactory classification accuracy.

SELECTION OF CITATIONS
SEARCH DETAIL
...