RESUMO
Introduction: Cancer is a major cause of mortality in the modern world, and one of the most important health problems in societies. During recent years, research on cancer as a system biology disease is focused on molecular differences between cancer cells and healthy cells. Most of the proposed methods for classifying cancer using gene expression data act as black boxes and lack biological interpretability. The goal of this study is to design an interpretable fuzzy model for classifying gene expression data of Lymphoma cancer
Method: In this research, the investigated microarray contained 45 samples of lymphoma. Total number of genes was 4026 samples. At first, we offer a hybrid approach to reduce the data dimension for detecting genes involved in lymphoma cancer. In lymphoma microarray, six out of 4029 genes were selected. Then, a fuzzy interpretable classifier was presented for classification of data. Fuzzy inference was performed using two rules which had the highest scores. Weka3.6.9 software was used to reduce the features and the fuzzy classifier model was implemented in MATLAB R2010a. Results of this study were assessed by two measures of accuracy and precision
Results: In pre-processing stage, in order to classify gene expression data of Lymphoma, six out of 4026 genes were identified as cancer-causing genes, and then the fuzzy classifier model was applied on the obtained data. The accuracy of the results of classification was 96 percent using 10 rules with the highest scores and that using 2 rules with the highest scores was about 98 percent
Conclusion: In the proposed approach, for the first time, a fully fuzzy method named a minimal rule fuzzy classification [MRFC] was introduced for extracting fuzzy rules with biological interpretability and meaning extraction from gene expression data. Among the most outstanding features of this method is the ability of extracting a small set of rules to interpret effective gene expression in cancer patients. Another result of this approach is successfully addressing the problem of disproportion between the number of samples and genes in microarrays with the proposed Filter-Wrapper Feature Selection method [FWFS]
Assuntos
Humanos , Linfoma/genética , Expressão Gênica , Variação Genética , Análise em Microsséries , Lógica Fuzzy , Modelos TeóricosRESUMO
Introduction: Manipulation of protein stability is important for understanding the principles that govern protein thermostability, both in basic research and industrial applications
Various data mining techniques exist for prediction of thermostable proteins
Furthermore, ANN methods have attracted significant attention for prediction of thermostability, because they constitute an appropriate approach to mapping the non-linear input-output relationships and massive parallel computing
Method: An Extreme Learning Machine [ELM] was applied to estimate thermal behavior of 1289 proteins. In the proposed algorithm, the parameters of ELM were optimized using a Genetic Algorithm [GA], which tuned a set of input variables, hidden layer biases, and input weights, to and enhance the prediction performance
The method was executed on a set of amino acids, yielding a total of 613 protein features
A number of feature selection algorithms were used to build subsets of the features. A total of 1289 protein samples and 613 protein features were calculated from UniProt database to understand features contributing to the enzymes' thermostability and find out the main features that influence this valuable characteristic
Results:At the primary structure level, Gin, Glu and polar were the features that mostly contributed to protein thermostability
At the secondary structure level, Helix_S, Coil, and charged_Coil were the most important features affecting protein thermostability
These results suggest that the thermostability of proteins is mainly associated with primary structural features of the protein
According to the results, the influence of primary structure on the thermostabilty of a protein was more important than that of the secondary structure
It is shown that prediction accuracy of ELM [mean square error] can improve dramatically using GA with error rates RMSE=0.004 and MAPE=0.1003
Conclusion: The proposed approach for forecasting problem significantly improves the accuracy of ELM in prediction of thermostable enzymes. ELM tends to require more neurons in the hidden-layer than conventional tuning-based learning algorithms. To overcome these, the proposed approach uses a GA which optimizes the structure and the parameters of the ELM
In summary, optimization of ELM with GA results in an efficient prediction method; numerical experiments proved that our approach yields excellent results