Search | Global Index Medicus

Feature selection using feature dissimilarity measure and density-based clustering: Application to biological data.

Sengupta, Debarka; Aich, Indranil; Bandyopadhyay, Sanghamitra.

J Biosci ; 2015 Oct; 40(4): 721-730

Article in English | IMSEAR | ID: sea-181454

ABSTRACT

Reduction of dimensionality has emerged as a routine process in modelling complex biological systems. A large number of feature selection techniques have been reported in the literature to improve model performance in terms of accuracy and speed. In the present article an unsupervised feature selection technique is proposed, using maximum information compression index as the dissimilarity measure and the well-known density-based cluster identification technique DBSCAN for identifying the largest natural group of dissimilar features. The algorithm is fast and less sensitive to the user-supplied parameters. Moreover, the method automatically determines the required number of features and identifies them. We used the proposed method for reducing dimensionality of a number of benchmark data sets of varying sizes. Its performance was also extensively compared with some other well-known feature selection methods.

Preface.

Bandyopadhyay, Sanghamitra; De, Rajat K.

J Biosci ; 2015 Oct; 40(4): 667-669

Article in English | IMSEAR | ID: sea-181444

Gene ordering in partitive clustering using microarray expressions.

Ray, Shubhra Sankar; Bandyopadhyay, Sanghamitra; Pal, Sankar K.

J Biosci ; 2007 Aug; 32(5): 1019-25

Article in English | IMSEAR | ID: sea-110707

ABSTRACT

A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering and ordering the genes using gene expression data into homogeneous groups was shown to be useful in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on gene ordering in hierarchical clustering framework for gene expression analysis, there is no work addressing and evaluating the importance of gene ordering in partitive clustering framework, to the best knowledge of the authors. Outside the framework of hierarchical clustering, different gene ordering algorithms are applied on the whole data set, and the domain of partitive clustering is still unexplored with gene ordering approaches. A new hybrid method is proposed for ordering genes in each of the clusters obtained from partitive clustering solution, using microarray gene expressions.Two existing algorithms for optimally ordering cities in travelling salesman problem (TSP), namely, FRAG_GALK and Concorde, are hybridized individually with self organizing MAP to show the importance of gene ordering in partitive clustering framework. We validated our hybrid approach using yeast and fibroblast data and showed that our approach improves the result quality of partitive clustering solution, by identifying subclusters within big clusters, grouping functionally correlated genes within clusters, minimization of summation of gene expression distances, and the maximization of biological gene ordering using MIPS categorization. Moreover, the new hybrid approach, finds comparable or sometimes superior biological gene order in less computation time than those obtained by optimal leaf ordering in hierarchical clustering solution.

Subject(s)

Algorithms , Computational Biology/methods , Gene Expression Profiling , Gene Expression Regulation/physiology , Gene Order/genetics , Humans , Models, Genetic , Multigene Family/physiology , Oligonucleotide Array Sequence Analysis , Saccharomyces cerevisiae Proteins/genetics

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL