Your browser doesn't support javascript.
loading
Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran / 대한의료정보학회지
Healthcare Informatics Research ; : 177-185, 2013.
Artículo en Inglés | WPRIM | ID: wpr-167420
ABSTRACT

OBJECTIVES:

Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes.

METHODS:

The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria.

RESULTS:

Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350).

CONCLUSIONS:

The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases.
Asunto(s)

Texto completo: Disponible Índice: WPRIM (Pacífico Occidental) Asunto principal: Modelos Logísticos / Tamizaje Masivo / Prevalencia / Estudios Transversales / Factores de Riesgo / Curva ROC / Sensibilidad y Especificidad / Países en Desarrollo / Minería de Datos / Máquina de Vectores de Soporte Tipo de estudio: Estudio diagnóstico / Estudio de etiología / Estudio observacional / Estudio de prevalencia / Estudio pronóstico / Factores de riesgo / Estudio de tamizaje Límite: Humanos País/Región como asunto: Asia Idioma: Inglés Revista: Healthcare Informatics Research Año: 2013 Tipo del documento: Artículo

Similares

MEDLINE

...
LILACS

LIS

Texto completo: Disponible Índice: WPRIM (Pacífico Occidental) Asunto principal: Modelos Logísticos / Tamizaje Masivo / Prevalencia / Estudios Transversales / Factores de Riesgo / Curva ROC / Sensibilidad y Especificidad / Países en Desarrollo / Minería de Datos / Máquina de Vectores de Soporte Tipo de estudio: Estudio diagnóstico / Estudio de etiología / Estudio observacional / Estudio de prevalencia / Estudio pronóstico / Factores de riesgo / Estudio de tamizaje Límite: Humanos País/Región como asunto: Asia Idioma: Inglés Revista: Healthcare Informatics Research Año: 2013 Tipo del documento: Artículo