Search | Global Index Medicus

Prediction the groundwater level of Hamadan-Bahar Plain, west of Iran using support vector machines

Lily, Tapak; Alireza, Rahmani; Abbas, Moghimbeigi.

Journal of Research in Health Sciences [JRHS]. 2014; 14 (1): 82-87

in English | IMEMR | ID: emr-133226

ABSTRACT

Water is considered as the main source of life but water resources are limited and nonrenewable. Different factors have caused groundwater to decrease. Therefore, modeling and predicting groundwater level is of great importance. Monthly groundwater level data of about 20 years [October 1991 to February 2012] from the Hamadan-Bahar Plain, west of Iran were used based on peizometric height related to hydrologic years. The support vector machine [SVM], a new nonlinear regression technique, was used to predict groundwater level. The performance of the SVM model was assessed by using criteria of R[2], root mean square error [RMSE], means absolute error [MAE], means absolute percentage error [MAPE], correlation coefficient and efficiency coefficient [E] and was then com-pared with the classic time series model. The SVM model had greater R[2] [=0.933], E [=0.950] and Correlation [=0.965]. Moreo-ver, SVM had lower RMSE [=0.120], MAPE [=0.140] and MAE [=0.124]. There was no signifi-cant difference between the estimated values using two models and the observed value. The SVM outperforms classic time series model in predicting groundwater level. Therefore using the SVM model is reasonable for modeling and predicting fluctuations of groundwater level in Hamadan-Bahar Plain.

Survival Analysis of Gastric Cancer Patients with Incomplete Data

Abbas MOGHIMBEIGI; Lily TAPAK; Ghodaratolla ROSHANAEI; Hossein MAHJUB.

Journal of Gastric Cancer ; : 259-265, 2014.

Article in English | WPRIM | ID: wpr-83545

ABSTRACT

PURPOSE: Survival analysis of gastric cancer patients requires knowledge about factors that affect survival time. This paper attempted to analyze the survival of patients with incomplete registered data by using imputation methods. MATERIALS AND METHODS: Three missing data imputation methods, including regression, expectation maximization algorithm, and multiple imputation (MI) using Monte Carlo Markov Chain methods, were applied to the data of cancer patients referred to the cancer institute at Imam Khomeini Hospital in Tehran in 2003 to 2008. The data included demographic variables, survival times, and censored variable of 471 patients with gastric cancer. After using imputation methods to account for missing covariate data, the data were analyzed using a Cox regression model and the results were compared. RESULTS: The mean patient survival time after diagnosis was 49.1+/-4.4 months. In the complete case analysis, which used information from 100 of the 471 patients, very wide and uninformative confidence intervals were obtained for the chemotherapy and surgery hazard ratios (HRs). However, after imputation, the maximum confidence interval widths for the chemotherapy and surgery HRs were 8.470 and 0.806, respectively. The minimum width corresponded with MI. Furthermore, the minimum Bayesian and Akaike information criteria values correlated with MI (-821.236 and -827.866, respectively). CONCLUSIONS: Missing value imputation increased the estimate precision and accuracy. In addition, MI yielded better results when compared with the expectation maximization algorithm and regression simple imputation methods.

Subject(s)

Humans , Diagnosis , Drug Therapy , Markov Chains , Proportional Hazards Models , Stomach Neoplasms , Survival Analysis

Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran / 대한의료정보학회지

Lily TAPAK; Hossein MAHJUB; Omid HAMIDI; Jalal POOROLAJAL.

Healthcare Informatics Research ; : 177-185, 2013.

Article in English | WPRIM | ID: wpr-167420

ABSTRACT

OBJECTIVES: Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. METHODS: The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. RESULTS: Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). CONCLUSIONS: The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases.

Subject(s)

Humans , Cross-Sectional Studies , Data Mining , Developing Countries , Iran , Logistic Models , Mass Screening , Prevalence , Risk Factors , ROC Curve , Sensitivity and Specificity , Support Vector Machine

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL