Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
Comput Biol Med ; 170: 107981, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38262204

ABSTRACT

A framework is developed for gene expression analysis by introducing fuzzy Jaccard similarity (FJS) and combining Lukasiewicz implication with it through weights in hybrid ensemble framework (WCLFJHEF) for gene selection in cancer. The method is called weighted combination of Lukasiewicz implication and fuzzy Jaccard similarity in hybrid ensemble framework (WCLFJHEF). While the fuzziness in Jaccard similarity is incorporated by using the existing Gödel fuzzy logic, the weights are obtained by maximizing the average F-score of selected genes in classifying the cancer patients. The patients are first divided into different clusters, based on the number of patient groups, using average linkage agglomerative clustering and a new score, called WCLFJ (weighted combination of Lukasiewicz implication and fuzzy Jaccard similarity). The genes are then selected from each cluster separately using filter based Relief-F and wrapper based SVMRFE (Support Vector Machine with Recursive Feature Elimination). A gene (feature) pool is created by considering the union of selected features for all the clusters. A set of informative genes is selected from the pool using sequential backward floating search (SBFS) algorithm. Patients are then classified using Naïve Bayes'(NB) and Support Vector Machine (SVM) separately, using the selected genes and the related F-scores are calculated. The weights in WCLFJ are then updated iteratively to maximize the average F-score obtained from the results of the classifier. The effectiveness of WCLFJHEF is demonstrated on six gene expression datasets. The average values of accuracy, F-score, recall, precision and MCC over all the datasets, are 95%, 94%, 94%, 94%, and 90%, respectively. The explainability of the selected genes is shown using SHapley Additive exPlanations (SHAP) values and this information is further used to rank them. The relevance of the selected gene set are biologically validated using the KEGG Pathway, Gene Ontology (GO), and existing literatures. It is seen that the genes that are selected by WCLFJHEF are candidates for genomic alterations in the various cancer types. The source code of WCLFJHEF is available at http://www.isical.ac.in/~shubhra/WCLFJHEF.html.


Subject(s)
Gene Expression Profiling , Neoplasms , Humans , Bayes Theorem , Gene Expression Profiling/methods , Algorithms , Neoplasms/metabolism , Software
2.
Article in English | MEDLINE | ID: mdl-31398129

ABSTRACT

MicroRNAs play an important role in controlling drug sensitivity and resistance in cancer. Identification of responsible miRNAs for drug resistance can enhance the effectiveness of treatment. A new set theoretic entropy measure (SPEM) is defined to determine the relevance and level of confidence of miRNAs in deciding their drug resistant nature. Here, a pattern is represented by a pair of values. One of them implies the degree of its belongingness (fuzzy membership) to a class and the other represents the actual class of origin (crisp membership). A measure, called granular probability, is defined that determines the confidence level of having a particular pair of membership values. The granules used to compute the said probability are formed by a histogram based method where each bin of a histogram is considered as one granule. The width and number of the bins are automatically determined by the algorithm. The set thus defined, comprising a pair of membership values and the confidence level for having them, is used for the computation of SPEM and thereby identifying the drug resistant miRNAs. The efficiency of SPEM is demonstrated extensively on six data sets. While the achieved F-score in classifying sensitive and resistant samples ranges between 0.31 & 0.50 using all the miRNAs by SVM classifier, the same score varies from 0.67 to 0.94 using only the top 1 percent drug resistant miRNAs. Superiority of the proposed method as compared to some existing ones is established in terms of F-score. The significance of the top 1 percent miRNAs in corresponding cancer is also verified by the different articles based on biological investigations. Source code of SPEM is available at http://www.jayanta.droppages.com/SPEM.html.


Subject(s)
Computational Biology/methods , Drug Resistance, Neoplasm/genetics , MicroRNAs/genetics , Neoplasms/genetics , Algorithms , Databases, Genetic , Entropy , Fuzzy Logic , Humans
3.
Comput Biol Med ; 104: 149-162, 2019 01.
Article in English | MEDLINE | ID: mdl-30472497

ABSTRACT

A method, named genetic algorithm for assigning weights to gene expressions using functional annotations (GAAWGEFA), is developed to assign proper weights to the gene expressions at each time point. The weights are estimated using functional annotations of the genes in a genetic algorithm framework. The method shows gene similarity in an improved manner as compared with other existing methods because it takes advantage of the existing functional annotations of the genes. The weight combination for the expressions at different time points is determined by maximizing the fitness function of GAAWGEFA in terms of the positive predictive value (PPV) for the top 10,000 gene pairs. The performance of the proposed method is primarily compared with Biweight mid correlation (BICOR) and original expression values for the six Saccharomyces cerevisiae datasets and one Bacillus subtilis dataset. The utility of GAAWGEFA is shown in predicting the functions of 48 unclassified genes (using p-value cutoff 10-13) from Saccharomyces cerevisiae microarray data where the expressions are weighted using GAAWGEFA and are clustered using k-medoids algorithm. The related code along with various parameters is available at http://sampa.droppages.com/GAAWGEFA.html.


Subject(s)
Algorithms , Bacillus subtilis , Data Curation , Databases, Nucleic Acid , Gene Expression Regulation, Bacterial/physiology , Gene Expression Regulation, Fungal/physiology , Models, Genetic , Saccharomyces cerevisiae , Bacillus subtilis/genetics , Bacillus subtilis/metabolism , Electronic Data Processing , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism
4.
Article in English | MEDLINE | ID: mdl-27831888

ABSTRACT

MicroRNAs (miRNAs) are known as an important indicator of cancers. The presence of cancer can be detected by identifying the responsible miRNAs. A fuzzy-rough entropy measure (FREM) is developed which can rank the miRNAs and thereby identify the relevant ones. FREM is used to determine the relevance of a miRNA in terms of separability between normal and cancer classes. While computing the FREM for a miRNA, fuzziness takes care of the overlapping between normal and cancer expressions, whereas rough lower approximation determines their class sizes. MiRNAs are sorted according to the highest relevance (i.e., the capability of class separation) and a percentage among them is selected from the top ranked ones. FREM is also used to determine the redundancy between two miRNAs and the redundant ones are removed from the selected set, as per the necessity. A histogram based patient selection method is also developed which can help to reduce the number of patients to be dealt during the computation of FREM, while compromising very little with the performance of the selected miRNAs for most of the data sets. The superiority of the FREM as compared to some existing methods is demonstrated extensively on six data sets in terms of sensitivity, specificity, and score. While for these data sets the score of the miRNAs selected by our method varies from 0.70 to 0.91 using SVM, those results vary from 0.37 to 0.90 for some other methods. Moreover, all the selected miRNAs corroborate with the findings of biological investigations or pathway analysis tools. The source code of FREM is available at http://www.jayanta.droppages.com/FREM.html.


Subject(s)
Computational Biology/methods , Fuzzy Logic , Gene Expression Profiling/methods , MicroRNAs/genetics , Neoplasms/genetics , Algorithms , Entropy , Humans , MicroRNAs/metabolism , Neoplasms/metabolism , Pattern Recognition, Automated
5.
Comput Biol Med ; 90: 59-67, 2017 11 01.
Article in English | MEDLINE | ID: mdl-28941844

ABSTRACT

Discretizing gene expression values is an important step in data preprocessing as it helps in reducing noise and experimental errors. This in turn provides better results in various tasks such as gene regulatory network analysis and disease prediction. A supervised discretization method for gene expressions using gene annotation is developed. The method is called "Gene Annotation Based Discretization" (GABD) where the discretization width is determined by maximizing the positive predictive value (PPV), computed using gene annotations, for top 20,000 gene pairs. The method can capture the gene similarity better than those obtained using original expressions. The performance of GABD is compared with some existing discretization methods like equal width discretization, equal frequency discretization and k-means discretization in terms of positive predictive value (PPV). The utility of GABD is also shown by clustering genes using k-medoid algorithm and thereby predicting the function of 23 unclassified Saccharomyces cerevisiae genes using p-value cut off 10-10. The source code for GABD is available at http://www.sampa.droppages.com/GABD.html.


Subject(s)
Gene Expression Profiling/methods , Gene Expression Regulation, Fungal/physiology , Gene Regulatory Networks/physiology , Genes, Fungal/physiology , Molecular Sequence Annotation , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism
6.
Comput Biol Med ; 89: 540-548, 2017 10 01.
Article in English | MEDLINE | ID: mdl-28844466

ABSTRACT

MicroRNAs (miRNA) are one of the important regulators of cell division and also responsible for cancer development. Among the discovered miRNAs, not all are important for cancer detection. In this regard a fuzzy mutual information (FMI) based grouping and miRNA selection method (FMIGS) is developed to identify the miRNAs responsible for a particular cancer. First, the miRNAs are ranked and divided into several groups. Then the most important group is selected among the generated groups. Both the steps viz., ranking of miRNAs and selection of the most relevant group of miRNAs, are performed using FMI. Here the number of groups is automatically determined by the grouping method. After the selection process, redundant miRNAs are removed from the selected set of miRNAs as per user's necessity. In a part of the investigation we proposed a FMI based particle swarm optimization (PSO) method for selecting relevant miRNAs, where FMI is used as a fitness function to determine the fitness of the particles. The effectiveness of FMIGS and FMI based PSO is tested on five data sets and their efficiency in selecting relevant miRNAs are demonstrated. The superior performance of FMIGS to some existing methods are established and the biological significance of the selected miRNAs is observed by the findings of the biological investigation and publicly available pathway analysis tools. The source code related to our investigation is available at http://www.jayanta.droppages.com/FMIGS.html.


Subject(s)
Gene Expression Regulation, Neoplastic , MicroRNAs , Models, Biological , Neoplasms , RNA, Neoplasm , Female , Fuzzy Logic , Humans , Male , MicroRNAs/biosynthesis , MicroRNAs/genetics , Neoplasms/genetics , Neoplasms/metabolism , RNA, Neoplasm/biosynthesis , RNA, Neoplasm/genetics
7.
Gene ; 595(2): 150-160, 2016 Dec 31.
Article in English | MEDLINE | ID: mdl-27688070

ABSTRACT

A supervised similarity measure for Saccharomyces cerevisiae gene expressions is developed which can capture the gene similarity when multiple types of experimental conditions like cell cycle, heat shock are available for all the genes. The measure is called Weighted Pearson correlation (WPC), where the weights are systematically determined for each type of experiment by maximizing the positive predictive value for gene pairs having Pearson correlation greater than 0.80. The positive predictive value is computed by using the annotation information available from yeast GO-Slim process annotations in Saccharomyces Genome Database (SGD). Genes are then clustered by k-medoid algorithm using the newly computed WPC, and functions of 135 unclassified genes are predicted with a p-value cutoff 10-5 using Munich Information for Protein Sequences (MIPS) annotations. Out of these genes, functional categories of 55 gene are predicted with p-value cutoff greater than 10-10 and reported in this investigation. The superiority of WPC as compared to some existing similarity measures like Pearson correlation and Euclidean distance is demonstrated using positive predictive (PPV) values of gene pairs for different Saccharomyces cerevisiae data sets. The related code is available at http://www.sampa.droppages.com/WPC.html.


Subject(s)
Algorithms , Gene Expression , Molecular Sequence Annotation/methods , Saccharomyces cerevisiae Proteins/genetics , Databases, Genetic , Genes, Fungal , Reproducibility of Results , Saccharomyces cerevisiae/genetics
8.
Med Biol Eng Comput ; 54(4): 701-10, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26264058

ABSTRACT

MicroRNAs (miRNAs) act as a major biomarker of cancer. All miRNAs in human body are not equally important for cancer identification. We propose a methodology, called FMIMS, which automatically selects the most relevant miRNAs for a particular type of cancer. In FMIMS, miRNAs are initially grouped by using a SVM-based algorithm; then the group with highest relevance is determined and the miRNAs in that group are finally ranked for selection according to their redundancy. Fuzzy mutual information is used in computing the relevance of a group and the redundancy of miRNAs within it. Superiority of the most relevant group to all others, in deciding normal or cancer, is demonstrated on breast, renal, colorectal, lung, melanoma and prostate data. The merit of FMIMS as compared to several existing methods is established. While 12 out of 15 selected miRNAs by FMIMS corroborate with those of biological investigations, three of them viz., "hsa-miR-519," "hsa-miR-431" and "hsa-miR-320c" are possible novel predictions for renal cancer, lung cancer and melanoma, respectively. The selected miRNAs are found to be involved in disease-specific pathways by targeting various genes. The method is also able to detect the responsible miRNAs even at the primary stage of cancer. The related code is available at http://www.jayanta.droppages.com/FMIMS.html .


Subject(s)
Fuzzy Logic , MicroRNAs/genetics , Neoplasms/genetics , Databases, Genetic , Humans , MicroRNAs/metabolism , Support Vector Machine
9.
IEEE Trans Neural Netw Learn Syst ; 27(9): 1890-906, 2016 09.
Article in English | MEDLINE | ID: mdl-26285222

ABSTRACT

A new granular self-organizing map (GSOM) is developed by integrating the concept of a fuzzy rough set with the SOM. While training the GSOM, the weights of a winning neuron and the neighborhood neurons are updated through a modified learning procedure. The neighborhood is newly defined using the fuzzy rough sets. The clusters (granules) evolved by the GSOM are presented to a decision table as its decision classes. Based on the decision table, a method of gene selection is developed. The effectiveness of the GSOM is shown in both clustering samples and developing an unsupervised fuzzy rough feature selection (UFRFS) method for gene selection in microarray data. While the superior results of the GSOM, as compared with the related clustering methods, are provided in terms of ß -index, DB-index, Dunn-index, and fuzzy rough entropy, the genes selected by the UFRFS are not only better in terms of classification accuracy and a feature evaluation index, but also statistically more significant than the related unsupervised methods. The C-codes of the GSOM and UFRFS are available online at http://avatharamg.webs.com/software-code.

10.
Gene ; 541(2): 129-37, 2014 May 15.
Article in English | MEDLINE | ID: mdl-24631265

ABSTRACT

Inference of gene regulatory networks (GRNs) is one of the most challenging research problems of Systems Biology. In this investigation, a new GRNs inference methodology, called Entropic Biological Score (EBS), which linearly combines the mean conditional entropy (MCE) from expression levels and a Biological Score (BS), obtained by integrating different biological data sources, is proposed. The EBS is validated with the Cell Cycle related functional annotation information, available from Munich Information Center for Protein Sequences (MIPS), and compared with some existing methods like MRNET, ARACNE, CLR and MCE for GRNs inference. For real networks, the performance of EBS, which uses the concept of integrating different data sources, is found to be superior to the aforementioned inference methods. The best results for EBS are obtained by considering the weights w1=0.2 and w2=0.8 for MCE and BS values, respectively, where approximately 40% of the inferred connections are found to be correct and significantly better than related methods. The results also indicate that expression profile is able to recover some true connections, that are not present in biological annotations, thus leading to the possibility of discovering new relations between its genes.


Subject(s)
Cell Cycle/genetics , Computational Biology/methods , Gene Regulatory Networks , Entropy , Gene Expression , Models, Theoretical , Phenotype , Protein Interaction Mapping
11.
Neural Netw ; 48: 91-108, 2013 Dec.
Article in English | MEDLINE | ID: mdl-23994187

ABSTRACT

A granular neural network for identifying salient features of data, based on the concepts of fuzzy set and a newly defined fuzzy rough set, is proposed. The formation of the network mainly involves an input vector, initial connection weights and a target value. Each feature of the data is normalized between 0 and 1 and used to develop granulation structures by a user defined α-value. The input vector and the target value of the network are defined using granulation structures, based on the concept of fuzzy sets. The same granulation structures are also presented to a decision system. The decision system helps in extracting the domain knowledge about data in the form of dependency factors, using the notion of new fuzzy rough set. These dependency factors are assigned as the initial connection weights of the proposed network. It is then trained using minimization of a novel feature evaluation index in an unsupervised manner. The effectiveness of the proposed network, in evaluating selected features, is demonstrated on several real-life datasets. The results of FRGNN are found to be statistically more significant than related methods in 28 instances of 40 instances, i.e., 70% of instances, using the paired t-test.


Subject(s)
Fuzzy Logic , Neural Networks, Computer , Algorithms , Arrhythmias, Cardiac/classification , Artificial Intelligence , Atmosphere , Bayes Theorem , Cell Cycle , Databases, Factual/classification , Decision Theory , Electronic Mail/classification , Entropy , Humans , Microarray Analysis , Neoplasms/classification , Plants/classification , Semiconductors/classification , Support Vector Machine , Terminology as Topic , Wavelet Analysis
12.
Article in English | MEDLINE | ID: mdl-23702539

ABSTRACT

Prediction of RNA structure is invaluable in creating new drugs and understanding genetic diseases. Several deterministic algorithms and soft computing-based techniques have been developed for more than a decade to determine the structure from a known RNA sequence. Soft computing gained importance with the need to get approximate solutions for RNA sequences by considering the issues related with kinetic effects, cotranscriptional folding, and estimation of certain energy parameters. A brief description of some of the soft computing-based techniques, developed for RNA secondary structure prediction, is presented along with their relevance. The basic concepts of RNA and its different structural elements like helix, bulge, hairpin loop, internal loop, and multiloop are described. These are followed by different methodologies, employing genetic algorithms, artificial neural networks, and fuzzy logic. The role of various metaheuristics, like simulated annealing, particle swarm optimization, ant colony optimization, and tabu search is also discussed. A relative comparison among different techniques, in predicting 12 known RNA secondary structures, is presented, as an example. Future challenging issues are then mentioned.


Subject(s)
Artificial Intelligence , Computational Biology/methods , Nucleic Acid Conformation , RNA/chemistry , Algorithms , Animals , Humans , Models, Genetic , Thermodynamics
13.
Front Genet ; 3: 59, 2012.
Article in English | MEDLINE | ID: mdl-22529851

ABSTRACT

One of the important goals of most biological investigations is to classify and organize the experimental findings so that they are readily useful for deriving generalized rules. Although there is a huge amount of information on RNA structures in PDB, there are redundant files, ambiguous synthetic sequences etc. Moreover, a systematic hierarchical organization, reflecting RNA classification, is missing in PDB. In this investigation, we have classified all the available RNA structures from PDB through a programmatic approach. Hence, it would be now a simple assignment to regularly update the classification as and when new structures are released. The classification can further determine (i) a non-redundant set of RNA structures and (ii) if available, a set of structures of identical sequence and function, which can highlight structural polymorphism, ligand-induced conformational alterations etc. Presently, we have classified the available structures (2095 PDB entries having RNA chain longer than nine nucleotides solved by X-ray crystallography or NMR spectroscopy) into nine functional classes. The structures of same function and same source are mostly seen to be similar with subtle differences depending on their functional complexation. The web-server is available online at http://www.saha.ac.in/biop/www/HD-RNAS.html and is updated regularly.

14.
IEEE Trans Biomed Eng ; 59(4): 1162-8, 2012 Apr.
Article in English | MEDLINE | ID: mdl-22318478

ABSTRACT

Predicting the functions of unannotated genes is one of the major challenges of biological investigation. In this study, we propose a weighted power scoring framework, called weighted power biological score (WPBS), for combining different biological data sources and predicting the function of some of the unclassified yeast Saccharomyces cerevisiae genes. The relative power and weight coefficients of different data sources, in the proposed score, are estimated systematically by utilizing functional annotations [yeast Gene Ontology (GO)-Slim: Process] of classified genes, available from Saccharomyces Genome Database. Genes are then clustered by applying k-medoids algorithm on WPBS, and functional categories of 334 unclassified genes are predicted using a P-value cutoff 1 ×10(-5). The WPBS is available online at http://www.isical.ac.in/~ shubhra/WPBS/WPBS.html, where one can download WPBS, related files, and a MATLAB code to predict functions of unclassified genes.


Subject(s)
Databases, Protein , Gene Expression Profiling/methods , Models, Biological , Protein Interaction Mapping/methods , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Signal Transduction/physiology , Amino Acid Sequence , Computer Simulation , Data Mining/methods , Molecular Sequence Data , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/genetics , Structure-Activity Relationship , Systems Integration
15.
Gene Expr ; 15(5-6): 243-53, 2012.
Article in English | MEDLINE | ID: mdl-23539902

ABSTRACT

MicroRNAs (miRNAs) play a major role in cancer development and also act as a key factor in many other diseases. In this investigation, we propose three methods for handling miRNA expressions. The first two methods determine whether a miRNA is indicating normal or cancer condition, and the third one determines how many miRNAs are supporting the cancer sample/patient. While Method 1 acts as a two-class classifier and is based on normalized average expression value, Method 2 also does the same and is based on the normalized average intraclass distance. Method 3 checks whether a miRNA belongs to the cancer class or not, provides the percentage of supporting miRNAs for a cancer patient, and is based on weighted normalized average intraclass distance. The values of the weights are determined using exhaustive search by maximizing the accuracy in training samples. The proposed methods are tested on the differentially regulated miRNAs in three types of cancers (breast, colon, and melanoma cancer). The performances of Method 1 and Method 2 are evaluated by F score, Matthews Correlation Coefficient (MCC), and plotting "1--specificity versus sensitivity" in Receiver Operating Characteristic (ROC) space and are found to be superior to the kNN and SVM classifiers for breast, colon, and melanoma cancer data sets. It is also observed that both the sensitivity and the specificity of Method 1 and Method 2 are higher than 0.5. For the same data sets, Method 3 achieved an average accuracy of more than 98% in detecting the miRNAs, supporting the cancer condition.


Subject(s)
MicroRNAs/genetics , Neoplasms/genetics , Humans , ROC Curve
16.
IEEE Trans Biomed Eng ; 56(2): 229-36, 2009 Feb.
Article in English | MEDLINE | ID: mdl-19272921

ABSTRACT

MOTIVATION: One of the important goals of biological investigation is to predict the function of unclassified gene. Although there is a rich literature on multi data source integration for gene function prediction, there is hardly any similar work in the framework of data source weighting using functional annotations of classified genes. In this investigation, we propose a new scoring framework, called biological score (BS) and incorporating data source weighting, for predicting the function of some of the unclassified yeast genes. METHODS: The BS is computed by first evaluating the similarities between genes, arising from different data sources, in a common framework, and then integrating them in a linear combination style through weights. The relative weight of each data source is determined adaptively by utilizing the information on yeast gene ontology (GO)-slim process annotations of classified genes, available from Saccharomyces Genome Database (SGD). Genes are clustered by a method called K-BS, where, for each gene, a cluster comprising that gene and its K nearest neighbors is computed using the proposed score (BS). The performances of BS and K-BS are evaluated with gene annotations available from Munich Information Center for Protein Sequences (MIPS). RESULTS: We predict the functional categories of 417 classified genes from 417 clusters with 0.98 positive predictive value using K-BS. The functional categories of 12 unclassified yeast genes are also predicted. CONCLUSION: Our experimental results indicate that considering multiple data sources and estimating their weights with annotations of classified genes can considerably enhance the performance of BS. It has been found that even a small proportion of annotated genes can provide improvements in finding true positive gene pairs using BS.


Subject(s)
Computational Biology/methods , Genes, Fungal , Models, Genetic , Saccharomyces cerevisiae Proteins/physiology , Saccharomyces cerevisiae/genetics , Cluster Analysis , Databases, Genetic , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Protein Interaction Mapping , Reproducibility of Results , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/genetics , Sequence Analysis, Protein
17.
J Biosci ; 32(5): 1019-25, 2007 Aug.
Article in English | MEDLINE | ID: mdl-17914244

ABSTRACT

A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering and ordering the genes using gene expression data into homogeneous groups was shown to be useful in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on gene ordering in hierarchical clustering framework for gene expression analysis, there is no work addressing and evaluating the importance of gene ordering in partitive clustering framework, to the best knowledge of the authors. Outside the framework of hierarchical clustering, different gene ordering algorithms are applied on the whole data set, and the domain of partitive clustering is still unexplored with gene ordering approaches. A new hybrid method is proposed for ordering genes in each of the clusters obtained from partitive clustering solution, using microarray gene expressions.Two existing algorithms for optimally ordering cities in travelling salesman problem (TSP), namely, FRAG_GALK and Concorde, are hybridized individually with self organizing MAP to show the importance of gene ordering in partitive clustering framework. We validated our hybrid approach using yeast and fibroblast data and showed that our approach improves the result quality of partitive clustering solution, by identifying subclusters within big clusters, grouping functionally correlated genes within clusters, minimization of summation of gene expression distances, and the maximization of biological gene ordering using MIPS categorization. Moreover, the new hybrid approach, finds comparable or sometimes superior biological gene order in less computation time than those obtained by optimal leaf ordering in hierarchical clustering solution.


Subject(s)
Gene Expression Profiling , Gene Expression Regulation/physiology , Gene Order/genetics , Multigene Family/physiology , Oligonucleotide Array Sequence Analysis , Algorithms , Computational Biology/methods , Humans , Models, Genetic , Saccharomyces cerevisiae Proteins/genetics
18.
IEEE Trans Syst Man Cybern B Cybern ; 37(3): 742-9, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17550128

ABSTRACT

This investigation deals with a new distance measure for genes using their microarray expressions and a new algorithm for fast gene ordering without clustering. This distance measure is called "Maxrange distance," where the distance between two genes corresponding to a particular type of experiment is computed using a normalization factor, which is dependent on the dynamic range of the gene expression values of that experiment. The new gene-ordering method called "Minimal Neighbor" is based on the concept of nearest neighbor heuristic involving O(n2) time complexity. The superiority of this distance measure and the comparability of the ordering algorithm have been extensively established on widely studied microarray data sets by performing statistical tests. An interesting application of this ordering algorithm is also demonstrated for finding useful groups of genes within clusters obtained from a nonhierarchical clustering method like the self-organizing map.


Subject(s)
Algorithms , Artificial Intelligence , Gene Expression Profiling/methods , Multigene Family/physiology , Oligonucleotide Array Sequence Analysis/methods , Pattern Recognition, Automated/methods , Computer Simulation , Models, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...