Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add filters








Language
Year range
1.
Journal of the Korean Society of Biological Psychiatry ; : 18-26, 2020.
Article in Korean | WPRIM | ID: wpr-894048

ABSTRACT

Objectives@#ZZThe aim was to find effective vectorization and classification models to predict a psychiatric diagnosis from text-basedmedical records. @*Methods@#ZZElectronic medical records (n = 494) of present illness were collected retrospectively in inpatient admission notes withthree diagnoses of major depressive disorder, type 1 bipolar disorder, and schizophrenia. Data were split into 400 training data and 94 independentvalidation data. Data were vectorized by two different models such as term frequency-inverse document frequency (TF-IDF)and Doc2vec. Machine learning models for classification including stochastic gradient descent, logistic regression, support vectorclassification, and deep learning (DL) were applied to predict three psychiatric diagnoses. Five-fold cross-validation was used to find aneffective model. Metrics such as accuracy, precision, recall, and F1-score were measured for comparison between the models. @*Results@#ZZFive-fold cross-validation in training data showed DL model with Doc2vec was the most effective model to predict the diagnosis(accuracy = 0.87, F1-score = 0.87). However, these metrics have been reduced in independent test data set with final workingDL models (accuracy = 0.79, F1-score = 0.79), while the model of logistic regression and support vector machine with Doc2vec showedslightly better performance (accuracy = 0.80, F1-score = 0.80) than the DL models with Doc2vec and others with TF-IDF. @*Conclusions@#ZZThe current results suggest that the vectorization may have more impact on the performance of classification thanthe machine learning model. However, data set had a number of limitations including small sample size, imbalance among the category,and its generalizability. With this regard, the need for research with multi-sites and large samples is suggested to improve the machinelearning models.

2.
Journal of the Korean Society of Biological Psychiatry ; : 18-26, 2020.
Article in Korean | WPRIM | ID: wpr-901752

ABSTRACT

Objectives@#ZZThe aim was to find effective vectorization and classification models to predict a psychiatric diagnosis from text-basedmedical records. @*Methods@#ZZElectronic medical records (n = 494) of present illness were collected retrospectively in inpatient admission notes withthree diagnoses of major depressive disorder, type 1 bipolar disorder, and schizophrenia. Data were split into 400 training data and 94 independentvalidation data. Data were vectorized by two different models such as term frequency-inverse document frequency (TF-IDF)and Doc2vec. Machine learning models for classification including stochastic gradient descent, logistic regression, support vectorclassification, and deep learning (DL) were applied to predict three psychiatric diagnoses. Five-fold cross-validation was used to find aneffective model. Metrics such as accuracy, precision, recall, and F1-score were measured for comparison between the models. @*Results@#ZZFive-fold cross-validation in training data showed DL model with Doc2vec was the most effective model to predict the diagnosis(accuracy = 0.87, F1-score = 0.87). However, these metrics have been reduced in independent test data set with final workingDL models (accuracy = 0.79, F1-score = 0.79), while the model of logistic regression and support vector machine with Doc2vec showedslightly better performance (accuracy = 0.80, F1-score = 0.80) than the DL models with Doc2vec and others with TF-IDF. @*Conclusions@#ZZThe current results suggest that the vectorization may have more impact on the performance of classification thanthe machine learning model. However, data set had a number of limitations including small sample size, imbalance among the category,and its generalizability. With this regard, the need for research with multi-sites and large samples is suggested to improve the machinelearning models.

SELECTION OF CITATIONS
SEARCH DETAIL