Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
1.
Environ Monit Assess ; 195(1): 223, 2022 Dec 22.
Article in English | MEDLINE | ID: mdl-36544059

ABSTRACT

The present study focuses on the prediction and assessment of the impact of lockdown because of coronavirus pandemic on the air quality during three different phases, viz., normal periods (1 January 2018-23 March 2020), complete lockdown (24 March 2020-31 May 2020), and partial lockdown (1 June 2020-30 September 2020). We identify the most important air pollutants influencing the air quality of Kolkata during three different periods using Random Forest, a tree-based machine learning (ML) algorithm. It is found that the ambient air quality of Kolkata is mainly affected with the aid of particulate matter or PM (PM10 and PM2.5). However, the effect of the lockdown is most prominent on PM2.5 which spreads in the air of Kolkata due to diesel-driven vehicles, domestic and commercial combustion activities, road dust, and open burning. To predict urban PM2.5 and PM10 concentrations 24 h in advance, we use a deep learning (DL) model, namely, stacked-bidirectional long short-term memory (stacked-BDLSTM). The model is trained during the normal periods, and it shows the superiority over some supervised ML models, like support vector machine, K-nearest neighbor classifier, multilayer perceptron, long short-term memory, and statistical time series forecasting model autoregressive integrated moving average. This pre-trained stacked-BDLSTM is applied to predict the concentrations of PM2.5 and PM10 during the pandemic situation of two cases, viz., complete lockdown and partial lockdown using a deep model-based transfer learning (TL) approach (TLS-BDLSTM). Transfer learning aims to utilize the information gained from one problem to improve the predictive performance of a learning model for a different but related problem. Our work helps to demonstrate how TL is useful when there is a scarcity of data during the COVID-19 pandemic regarding the drastic change in concentration of pollutants. The results reveal the best prediction performance of TLS-BDLSTM with a lead time of 24 h as compared to some well-known traditional ML and statistical models and the pre-trained stacked-BDLSTM. The prediction is then validated using the real-time data obtained during the complete lockdown due to COVID second wave (16 May-15 June 2021) with different time steps, e.g., 24 h, 48 h, 72 h, and 96-120 h. TLS-BDLSTM involving transfer learning is seen to outperform the said comparing methods in modeling the long-term temporal dependency of multivariate time series data and boost the forecast efficiency not only in single step, but also in multiple steps. The proposed methodologies are effective, consistent, and can be used by operational organizations to utilize in monitoring and management of air quality.


Subject(s)
Air Pollutants , Air Pollution , COVID-19 , Humans , COVID-19/epidemiology , Pandemics , Environmental Monitoring/methods , Communicable Disease Control , Air Pollution/analysis , Air Pollutants/analysis , Particulate Matter/analysis
2.
Environ Monit Assess ; 194(9): 653, 2022 Aug 06.
Article in English | MEDLINE | ID: mdl-35933570

ABSTRACT

Kolkata has a reputation for being one of the world's most polluted cities, particularly in the post-monsoon months of October, November, and December. Diwali, a Hindu festival, coincides with these months where a large number of firecrackers are set off followed by high emissions of air pollutants. As a result, the air quality index (AQI) deteriorates to "very poor" (301 ≤ AQI ≤ 400) and "poor" (201 ≤ AQI ≤ 300) categories. This situation stays for several days to a month. The present study aims to identify the thresholds for PM2.5 and PM10 that cause the AQI of Kolkata to deteriorate to "very poor" and "poor." For this purpose, we have used a rough set theory-based condition-decision support system to predict the aforementioned categories of AQI. We have developed a Z-number-based novel quantification measure of semantic information of AQI to assess the reliability of the outcomes, as generated from the condition-decision-based decision rules, during post-monsoon season. The result reveals the best possible forecast of AQI with linguistic summarization of the reliability or confidence for different threshold ranges of PM10 and PM2.5. Inverse-decision rules based on rough set theory are utilized to justify and validate the forecasts. The explainability of the condition-decision support system is demonstrated/visualized using a flow graph that maps rough-rule-based different decision paths between input and output with strength, certainty, and coverage. The investigation resulted in an advanced intelligent environmental decision support system (IEDSS) for air-quality prediction.


Subject(s)
Air Pollutants , Air Pollution , Air Pollutants/analysis , Air Pollution/analysis , Cities , Environmental Monitoring/methods , Particulate Matter/analysis , Reproducibility of Results
3.
Article in English | MEDLINE | ID: mdl-31398129

ABSTRACT

MicroRNAs play an important role in controlling drug sensitivity and resistance in cancer. Identification of responsible miRNAs for drug resistance can enhance the effectiveness of treatment. A new set theoretic entropy measure (SPEM) is defined to determine the relevance and level of confidence of miRNAs in deciding their drug resistant nature. Here, a pattern is represented by a pair of values. One of them implies the degree of its belongingness (fuzzy membership) to a class and the other represents the actual class of origin (crisp membership). A measure, called granular probability, is defined that determines the confidence level of having a particular pair of membership values. The granules used to compute the said probability are formed by a histogram based method where each bin of a histogram is considered as one granule. The width and number of the bins are automatically determined by the algorithm. The set thus defined, comprising a pair of membership values and the confidence level for having them, is used for the computation of SPEM and thereby identifying the drug resistant miRNAs. The efficiency of SPEM is demonstrated extensively on six data sets. While the achieved F-score in classifying sensitive and resistant samples ranges between 0.31 & 0.50 using all the miRNAs by SVM classifier, the same score varies from 0.67 to 0.94 using only the top 1 percent drug resistant miRNAs. Superiority of the proposed method as compared to some existing ones is established in terms of F-score. The significance of the top 1 percent miRNAs in corresponding cancer is also verified by the different articles based on biological investigations. Source code of SPEM is available at http://www.jayanta.droppages.com/SPEM.html.


Subject(s)
Computational Biology/methods , Drug Resistance, Neoplasm/genetics , MicroRNAs/genetics , Neoplasms/genetics , Algorithms , Databases, Genetic , Entropy , Fuzzy Logic , Humans
4.
Article in English | MEDLINE | ID: mdl-27831888

ABSTRACT

MicroRNAs (miRNAs) are known as an important indicator of cancers. The presence of cancer can be detected by identifying the responsible miRNAs. A fuzzy-rough entropy measure (FREM) is developed which can rank the miRNAs and thereby identify the relevant ones. FREM is used to determine the relevance of a miRNA in terms of separability between normal and cancer classes. While computing the FREM for a miRNA, fuzziness takes care of the overlapping between normal and cancer expressions, whereas rough lower approximation determines their class sizes. MiRNAs are sorted according to the highest relevance (i.e., the capability of class separation) and a percentage among them is selected from the top ranked ones. FREM is also used to determine the redundancy between two miRNAs and the redundant ones are removed from the selected set, as per the necessity. A histogram based patient selection method is also developed which can help to reduce the number of patients to be dealt during the computation of FREM, while compromising very little with the performance of the selected miRNAs for most of the data sets. The superiority of the FREM as compared to some existing methods is demonstrated extensively on six data sets in terms of sensitivity, specificity, and score. While for these data sets the score of the miRNAs selected by our method varies from 0.70 to 0.91 using SVM, those results vary from 0.37 to 0.90 for some other methods. Moreover, all the selected miRNAs corroborate with the findings of biological investigations or pathway analysis tools. The source code of FREM is available at http://www.jayanta.droppages.com/FREM.html.


Subject(s)
Computational Biology/methods , Fuzzy Logic , Gene Expression Profiling/methods , MicroRNAs/genetics , Neoplasms/genetics , Algorithms , Entropy , Humans , MicroRNAs/metabolism , Neoplasms/metabolism , Pattern Recognition, Automated
5.
Comput Biol Med ; 89: 540-548, 2017 10 01.
Article in English | MEDLINE | ID: mdl-28844466

ABSTRACT

MicroRNAs (miRNA) are one of the important regulators of cell division and also responsible for cancer development. Among the discovered miRNAs, not all are important for cancer detection. In this regard a fuzzy mutual information (FMI) based grouping and miRNA selection method (FMIGS) is developed to identify the miRNAs responsible for a particular cancer. First, the miRNAs are ranked and divided into several groups. Then the most important group is selected among the generated groups. Both the steps viz., ranking of miRNAs and selection of the most relevant group of miRNAs, are performed using FMI. Here the number of groups is automatically determined by the grouping method. After the selection process, redundant miRNAs are removed from the selected set of miRNAs as per user's necessity. In a part of the investigation we proposed a FMI based particle swarm optimization (PSO) method for selecting relevant miRNAs, where FMI is used as a fitness function to determine the fitness of the particles. The effectiveness of FMIGS and FMI based PSO is tested on five data sets and their efficiency in selecting relevant miRNAs are demonstrated. The superior performance of FMIGS to some existing methods are established and the biological significance of the selected miRNAs is observed by the findings of the biological investigation and publicly available pathway analysis tools. The source code related to our investigation is available at http://www.jayanta.droppages.com/FMIGS.html.


Subject(s)
Gene Expression Regulation, Neoplastic , MicroRNAs , Models, Biological , Neoplasms , RNA, Neoplasm , Female , Fuzzy Logic , Humans , Male , MicroRNAs/biosynthesis , MicroRNAs/genetics , Neoplasms/genetics , Neoplasms/metabolism , RNA, Neoplasm/biosynthesis , RNA, Neoplasm/genetics
6.
Med Biol Eng Comput ; 54(4): 701-10, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26264058

ABSTRACT

MicroRNAs (miRNAs) act as a major biomarker of cancer. All miRNAs in human body are not equally important for cancer identification. We propose a methodology, called FMIMS, which automatically selects the most relevant miRNAs for a particular type of cancer. In FMIMS, miRNAs are initially grouped by using a SVM-based algorithm; then the group with highest relevance is determined and the miRNAs in that group are finally ranked for selection according to their redundancy. Fuzzy mutual information is used in computing the relevance of a group and the redundancy of miRNAs within it. Superiority of the most relevant group to all others, in deciding normal or cancer, is demonstrated on breast, renal, colorectal, lung, melanoma and prostate data. The merit of FMIMS as compared to several existing methods is established. While 12 out of 15 selected miRNAs by FMIMS corroborate with those of biological investigations, three of them viz., "hsa-miR-519," "hsa-miR-431" and "hsa-miR-320c" are possible novel predictions for renal cancer, lung cancer and melanoma, respectively. The selected miRNAs are found to be involved in disease-specific pathways by targeting various genes. The method is also able to detect the responsible miRNAs even at the primary stage of cancer. The related code is available at http://www.jayanta.droppages.com/FMIMS.html .


Subject(s)
Fuzzy Logic , MicroRNAs/genetics , Neoplasms/genetics , Databases, Genetic , Humans , MicroRNAs/metabolism , Support Vector Machine
7.
IEEE Trans Neural Netw Learn Syst ; 27(9): 1890-906, 2016 09.
Article in English | MEDLINE | ID: mdl-26285222

ABSTRACT

A new granular self-organizing map (GSOM) is developed by integrating the concept of a fuzzy rough set with the SOM. While training the GSOM, the weights of a winning neuron and the neighborhood neurons are updated through a modified learning procedure. The neighborhood is newly defined using the fuzzy rough sets. The clusters (granules) evolved by the GSOM are presented to a decision table as its decision classes. Based on the decision table, a method of gene selection is developed. The effectiveness of the GSOM is shown in both clustering samples and developing an unsupervised fuzzy rough feature selection (UFRFS) method for gene selection in microarray data. While the superior results of the GSOM, as compared with the related clustering methods, are provided in terms of ß -index, DB-index, Dunn-index, and fuzzy rough entropy, the genes selected by the UFRFS are not only better in terms of classification accuracy and a feature evaluation index, but also statistically more significant than the related unsupervised methods. The C-codes of the GSOM and UFRFS are available online at http://avatharamg.webs.com/software-code.

8.
Neural Netw ; 48: 91-108, 2013 Dec.
Article in English | MEDLINE | ID: mdl-23994187

ABSTRACT

A granular neural network for identifying salient features of data, based on the concepts of fuzzy set and a newly defined fuzzy rough set, is proposed. The formation of the network mainly involves an input vector, initial connection weights and a target value. Each feature of the data is normalized between 0 and 1 and used to develop granulation structures by a user defined α-value. The input vector and the target value of the network are defined using granulation structures, based on the concept of fuzzy sets. The same granulation structures are also presented to a decision system. The decision system helps in extracting the domain knowledge about data in the form of dependency factors, using the notion of new fuzzy rough set. These dependency factors are assigned as the initial connection weights of the proposed network. It is then trained using minimization of a novel feature evaluation index in an unsupervised manner. The effectiveness of the proposed network, in evaluating selected features, is demonstrated on several real-life datasets. The results of FRGNN are found to be statistically more significant than related methods in 28 instances of 40 instances, i.e., 70% of instances, using the paired t-test.


Subject(s)
Fuzzy Logic , Neural Networks, Computer , Algorithms , Arrhythmias, Cardiac/classification , Artificial Intelligence , Atmosphere , Bayes Theorem , Cell Cycle , Databases, Factual/classification , Decision Theory , Electronic Mail/classification , Entropy , Humans , Microarray Analysis , Neoplasms/classification , Plants/classification , Semiconductors/classification , Support Vector Machine , Terminology as Topic , Wavelet Analysis
9.
Article in English | MEDLINE | ID: mdl-23702539

ABSTRACT

Prediction of RNA structure is invaluable in creating new drugs and understanding genetic diseases. Several deterministic algorithms and soft computing-based techniques have been developed for more than a decade to determine the structure from a known RNA sequence. Soft computing gained importance with the need to get approximate solutions for RNA sequences by considering the issues related with kinetic effects, cotranscriptional folding, and estimation of certain energy parameters. A brief description of some of the soft computing-based techniques, developed for RNA secondary structure prediction, is presented along with their relevance. The basic concepts of RNA and its different structural elements like helix, bulge, hairpin loop, internal loop, and multiloop are described. These are followed by different methodologies, employing genetic algorithms, artificial neural networks, and fuzzy logic. The role of various metaheuristics, like simulated annealing, particle swarm optimization, ant colony optimization, and tabu search is also discussed. A relative comparison among different techniques, in predicting 12 known RNA secondary structures, is presented, as an example. Future challenging issues are then mentioned.


Subject(s)
Artificial Intelligence , Computational Biology/methods , Nucleic Acid Conformation , RNA/chemistry , Algorithms , Animals , Humans , Models, Genetic , Thermodynamics
10.
IEEE Trans Biomed Eng ; 59(4): 1162-8, 2012 Apr.
Article in English | MEDLINE | ID: mdl-22318478

ABSTRACT

Predicting the functions of unannotated genes is one of the major challenges of biological investigation. In this study, we propose a weighted power scoring framework, called weighted power biological score (WPBS), for combining different biological data sources and predicting the function of some of the unclassified yeast Saccharomyces cerevisiae genes. The relative power and weight coefficients of different data sources, in the proposed score, are estimated systematically by utilizing functional annotations [yeast Gene Ontology (GO)-Slim: Process] of classified genes, available from Saccharomyces Genome Database. Genes are then clustered by applying k-medoids algorithm on WPBS, and functional categories of 334 unclassified genes are predicted using a P-value cutoff 1 ×10(-5). The WPBS is available online at http://www.isical.ac.in/~ shubhra/WPBS/WPBS.html, where one can download WPBS, related files, and a MATLAB code to predict functions of unclassified genes.


Subject(s)
Databases, Protein , Gene Expression Profiling/methods , Models, Biological , Protein Interaction Mapping/methods , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Signal Transduction/physiology , Amino Acid Sequence , Computer Simulation , Data Mining/methods , Molecular Sequence Data , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/genetics , Structure-Activity Relationship , Systems Integration
11.
Gene Expr ; 15(5-6): 243-53, 2012.
Article in English | MEDLINE | ID: mdl-23539902

ABSTRACT

MicroRNAs (miRNAs) play a major role in cancer development and also act as a key factor in many other diseases. In this investigation, we propose three methods for handling miRNA expressions. The first two methods determine whether a miRNA is indicating normal or cancer condition, and the third one determines how many miRNAs are supporting the cancer sample/patient. While Method 1 acts as a two-class classifier and is based on normalized average expression value, Method 2 also does the same and is based on the normalized average intraclass distance. Method 3 checks whether a miRNA belongs to the cancer class or not, provides the percentage of supporting miRNAs for a cancer patient, and is based on weighted normalized average intraclass distance. The values of the weights are determined using exhaustive search by maximizing the accuracy in training samples. The proposed methods are tested on the differentially regulated miRNAs in three types of cancers (breast, colon, and melanoma cancer). The performances of Method 1 and Method 2 are evaluated by F score, Matthews Correlation Coefficient (MCC), and plotting "1--specificity versus sensitivity" in Receiver Operating Characteristic (ROC) space and are found to be superior to the kNN and SVM classifiers for breast, colon, and melanoma cancer data sets. It is also observed that both the sensitivity and the specificity of Method 1 and Method 2 are higher than 0.5. For the same data sets, Method 3 achieved an average accuracy of more than 98% in detecting the miRNAs, supporting the cancer condition.


Subject(s)
MicroRNAs/genetics , Neoplasms/genetics , Humans , ROC Curve
12.
IEEE Trans Image Process ; 20(5): 1211-20, 2011 May.
Article in English | MEDLINE | ID: mdl-20923736

ABSTRACT

Histogram equalization, which aims at information maximization, is widely used in different ways to perform contrast enhancement in images. In this paper, an automatic exact histogram specification technique is proposed and used for global and local contrast enhancement of images. The desired histogram is obtained by first subjecting the image histogram to a modification process and then by maximizing a measure that represents increase in information and decrease in ambiguity. A new method of measuring image contrast based upon local band-limited approach and center-surround retinal receptive field model is also devised in this paper. This method works at multiple scales (frequency bands) and combines the contrast measures obtained at different scales using L(p)-norm. In comparison to a few existing methods, the effectiveness of the proposed automatic exact histogram specification technique in enhancing contrasts of images is demonstrated through qualitative analysis and the proposed image contrast measure based quantitative analysis.


Subject(s)
Image Enhancement/methods , Algorithms , Computer Graphics , Computer Simulation , Signal Processing, Computer-Assisted
13.
IEEE Trans Syst Man Cybern B Cybern ; 40(3): 741-52, 2010 Jun.
Article in English | MEDLINE | ID: mdl-19887323

ABSTRACT

Several information measures such as entropy, mutual information, and f-information have been shown to be successful for selecting a set of relevant and nonredundant genes from a high-dimensional microarray data set. However, for continuous gene expression values, it is very difficult to find the true density functions and to perform the integrations required to compute different information measures. In this regard, the concept of the fuzzy equivalence partition matrix is presented to approximate the true marginal and joint distributions of continuous gene expression values. The fuzzy equivalence partition matrix is based on the theory of fuzzy-rough sets, where each row of the matrix represents a fuzzy equivalence partition that can automatically be derived from the given expression values. The performance of the proposed approach is compared with that of existing approaches using the class separability index and the predictive accuracy of the support vector machine. An important finding, however, is that the proposed approach is shown to be effective for selecting relevant and nonredundant continuous-valued genes from microarray data.


Subject(s)
Algorithms , Artificial Intelligence , Fuzzy Logic , Gene Expression Profiling/methods , Models, Theoretical , Oligonucleotide Array Sequence Analysis/methods , Pattern Recognition, Automated/methods , Computer Simulation , Decision Support Techniques
14.
IEEE Trans Image Process ; 18(4): 879-88, 2009 Apr.
Article in English | MEDLINE | ID: mdl-19278924

ABSTRACT

This paper presents a novel histogram thresholding methodology using fuzzy and rough set theories. The strength of the proposed methodology lies in the fact that it does not make any prior assumptions about the histogram unlike many existing techniques. For bilevel thresholding, every element of the histogram is associated with one of the two regions by comparing the corresponding errors of association. The regions are considered ambiguous in nature, and, hence, the error measures are based on the fuzziness or roughness of the regions. Multilevel thresholding is carried out using the proposed bilevel thresholding method in a tree structured algorithm. Segmentation, object/background separation, and edge extraction are performed using the proposed methodology. A quantitative index to evaluate image segmentation performance is also proposed using the median of absolute deviation from median measure, which is a robust estimator of scale. Extensive experimental results are given to demonstrate the effectiveness of the proposed methods in terms of both qualitative and quantitative measures.

15.
IEEE Trans Biomed Eng ; 56(2): 229-36, 2009 Feb.
Article in English | MEDLINE | ID: mdl-19272921

ABSTRACT

MOTIVATION: One of the important goals of biological investigation is to predict the function of unclassified gene. Although there is a rich literature on multi data source integration for gene function prediction, there is hardly any similar work in the framework of data source weighting using functional annotations of classified genes. In this investigation, we propose a new scoring framework, called biological score (BS) and incorporating data source weighting, for predicting the function of some of the unclassified yeast genes. METHODS: The BS is computed by first evaluating the similarities between genes, arising from different data sources, in a common framework, and then integrating them in a linear combination style through weights. The relative weight of each data source is determined adaptively by utilizing the information on yeast gene ontology (GO)-slim process annotations of classified genes, available from Saccharomyces Genome Database (SGD). Genes are clustered by a method called K-BS, where, for each gene, a cluster comprising that gene and its K nearest neighbors is computed using the proposed score (BS). The performances of BS and K-BS are evaluated with gene annotations available from Munich Information Center for Protein Sequences (MIPS). RESULTS: We predict the functional categories of 417 classified genes from 417 clusters with 0.98 positive predictive value using K-BS. The functional categories of 12 unclassified yeast genes are also predicted. CONCLUSION: Our experimental results indicate that considering multiple data sources and estimating their weights with annotations of classified genes can considerably enhance the performance of BS. It has been found that even a small proportion of annotated genes can provide improvements in finding true positive gene pairs using BS.


Subject(s)
Computational Biology/methods , Genes, Fungal , Models, Genetic , Saccharomyces cerevisiae Proteins/physiology , Saccharomyces cerevisiae/genetics , Cluster Analysis , Databases, Genetic , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Protein Interaction Mapping , Reproducibility of Results , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/genetics , Sequence Analysis, Protein
16.
IEEE Trans Syst Man Cybern B Cybern ; 39(1): 117-28, 2009 Feb.
Article in English | MEDLINE | ID: mdl-19150762

ABSTRACT

Quantifying ambiguities in images using fuzzy set theory has been of utmost interest to researchers in the field of image processing. In this paper, we present the use of rough set theory and its certain generalizations for quantifying ambiguities in images and compare it to the use of fuzzy set theory. We propose classes of entropy measures based on rough set theory and its certain generalizations, and perform rigorous theoretical analysis to provide some properties which they satisfy. Grayness and spatial ambiguities in images are then quantified using the proposed entropy measures. We demonstrate the utility and effectiveness of the proposed entropy measures by considering some elementary image processing applications. We also propose a new measure called average image ambiguity in this context.


Subject(s)
Entropy , Fuzzy Logic , Image Enhancement/methods , Image Processing, Computer-Assisted/methods , Algorithms , Computer Simulation
17.
J Biosci ; 32(5): 1019-25, 2007 Aug.
Article in English | MEDLINE | ID: mdl-17914244

ABSTRACT

A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering and ordering the genes using gene expression data into homogeneous groups was shown to be useful in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on gene ordering in hierarchical clustering framework for gene expression analysis, there is no work addressing and evaluating the importance of gene ordering in partitive clustering framework, to the best knowledge of the authors. Outside the framework of hierarchical clustering, different gene ordering algorithms are applied on the whole data set, and the domain of partitive clustering is still unexplored with gene ordering approaches. A new hybrid method is proposed for ordering genes in each of the clusters obtained from partitive clustering solution, using microarray gene expressions.Two existing algorithms for optimally ordering cities in travelling salesman problem (TSP), namely, FRAG_GALK and Concorde, are hybridized individually with self organizing MAP to show the importance of gene ordering in partitive clustering framework. We validated our hybrid approach using yeast and fibroblast data and showed that our approach improves the result quality of partitive clustering solution, by identifying subclusters within big clusters, grouping functionally correlated genes within clusters, minimization of summation of gene expression distances, and the maximization of biological gene ordering using MIPS categorization. Moreover, the new hybrid approach, finds comparable or sometimes superior biological gene order in less computation time than those obtained by optimal leaf ordering in hierarchical clustering solution.


Subject(s)
Gene Expression Profiling , Gene Expression Regulation/physiology , Gene Order/genetics , Multigene Family/physiology , Oligonucleotide Array Sequence Analysis , Algorithms , Computational Biology/methods , Humans , Models, Genetic , Saccharomyces cerevisiae Proteins/genetics
18.
IEEE Trans Syst Man Cybern B Cybern ; 37(3): 742-9, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17550128

ABSTRACT

This investigation deals with a new distance measure for genes using their microarray expressions and a new algorithm for fast gene ordering without clustering. This distance measure is called "Maxrange distance," where the distance between two genes corresponding to a particular type of experiment is computed using a normalization factor, which is dependent on the dynamic range of the gene expression values of that experiment. The new gene-ordering method called "Minimal Neighbor" is based on the concept of nearest neighbor heuristic involving O(n2) time complexity. The superiority of this distance measure and the comparability of the ordering algorithm have been extensively established on widely studied microarray data sets by performing statistical tests. An interesting application of this ordering algorithm is also demonstrated for finding useful groups of genes within clusters obtained from a nonhierarchical clustering method like the self-organizing map.


Subject(s)
Algorithms , Artificial Intelligence , Gene Expression Profiling/methods , Multigene Family/physiology , Oligonucleotide Array Sequence Analysis/methods , Pattern Recognition, Automated/methods , Computer Simulation , Models, Genetic
19.
IEEE Trans Syst Man Cybern B Cybern ; 37(2): 350-60, 2007 Apr.
Article in English | MEDLINE | ID: mdl-17416163

ABSTRACT

A novel corpus-based method for stemmer refinement, which can provide improvement in both classification and retrieval, is described. The method models the given words as generated from a multinomial distribution over the topics available in the corpus and includes a procedurelike sequential hypothesis testing that enables grouping together distributionally similar words. The system can refine any stemmer, and its strength can be controlled with parameters that reflect the amount of tolerance to be allowed in computing the similarity between the distributions of two words. Although obtaining the morphological roots of the given words is not the primary objective, the algorithm automatically does that to some extent. Despite a huge reduction in dictionary size, classification accuracies are seen to improve significantly when the proposed system is applied on some existing stemmers for classifying 20 Newsgroups and WebKB data. The refinements obtained are also suitable for cross-corpus stemming. Regarding retrieval, its superiority is extensively demonstrated with respect to four existing methods.


Subject(s)
Algorithms , Artificial Intelligence , Information Storage and Retrieval/methods , Natural Language Processing , Pattern Recognition, Automated/methods , Vocabulary, Controlled
20.
IEEE Trans Syst Man Cybern B Cybern ; 37(6): 1529-40, 2007 Dec.
Article in English | MEDLINE | ID: mdl-18179071

ABSTRACT

A generalized hybrid unsupervised learning algorithm, which is termed as rough-fuzzy possibilistic c-means (RFPCM), is proposed in this paper. It comprises a judicious integration of the principles of rough and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in class definition, the membership function of fuzzy sets enables efficient handling of overlapping partitions. It incorporates both probabilistic and possibilistic memberships simultaneously to avoid the problems of noise sensitivity of fuzzy c-means and the coincident clusters of PCM. The concept of crisp lower bound and fuzzy boundary of a class, which is introduced in the RFPCM, enables efficient selection of cluster prototypes. The algorithm is generalized in the sense that all existing variants of c-means algorithms can be derived from the proposed algorithm as a special case. Several quantitative indices are introduced based on rough sets for the evaluation of performance of the proposed c-means algorithm. The effectiveness of the algorithm, along with a comparison with other algorithms, has been demonstrated both qualitatively and quantitatively on a set of real-life data sets.


Subject(s)
Algorithms , Brain/anatomy & histology , Cluster Analysis , Fuzzy Logic , Image Interpretation, Computer-Assisted/methods , Models, Statistical , Pattern Recognition, Automated/methods , Computer Simulation , Humans , Magnetic Resonance Imaging/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...