Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
1.
BMC Med Res Methodol ; 24(1): 114, 2024 May 17.
Article in English | MEDLINE | ID: mdl-38760718

ABSTRACT

BACKGROUND: Smoking is a critical risk factor responsible for over eight million annual deaths worldwide. It is essential to obtain information on smoking habits to advance research and implement preventive measures such as screening of high-risk individuals. In most countries, including Denmark, smoking habits are not systematically recorded and at best documented within unstructured free-text segments of electronic health records (EHRs). This would require researchers and clinicians to manually navigate through extensive amounts of unstructured data, which is one of the main reasons that smoking habits are rarely integrated into larger studies. Our aim is to develop machine learning models to classify patients' smoking status from their EHRs. METHODS: This study proposes an efficient natural language processing (NLP) pipeline capable of classifying patients' smoking status and providing explanations for the decisions. The proposed NLP pipeline comprises four distinct components, which are; (1) considering preprocessing techniques to address abbreviations, punctuation, and other textual irregularities, (2) four cutting-edge feature extraction techniques, i.e. Embedding, BERT, Word2Vec, and Count Vectorizer, employed to extract the optimal features, (3) utilization of a Stacking-based Ensemble (SE) model and a Convolutional Long Short-Term Memory Neural Network (CNN-LSTM) for the identification of smoking status, and (4) application of a local interpretable model-agnostic explanation to explain the decisions rendered by the detection models. The EHRs of 23,132 patients with suspected lung cancer were collected from the Region of Southern Denmark during the period 1/1/2009-31/12/2018. A medical professional annotated the data into 'Smoker' and 'Non-Smoker' with further classifications as 'Active-Smoker', 'Former-Smoker', and 'Never-Smoker'. Subsequently, the annotated dataset was used for the development of binary and multiclass classification models. An extensive comparison was conducted of the detection performance across various model architectures. RESULTS: The results of experimental validation confirm the consistency among the models. However, for binary classification, BERT method with CNN-LSTM architecture outperformed other models by achieving precision, recall, and F1-scores between 97% and 99% for both Never-Smokers and Active-Smokers. In multiclass classification, the Embedding technique with CNN-LSTM architecture yielded the most favorable results in class-specific evaluations, with equal performance measures of 97% for Never-Smoker and measures in the range of 86 to 89% for Active-Smoker and 91-92% for Never-Smoker. CONCLUSION: Our proposed NLP pipeline achieved a high level of classification performance. In addition, we presented the explanation of the decision made by the best performing detection model. Future work will expand the model's capabilities to analyze longer notes and a broader range of categories to maximize its utility in further research and screening applications.


Subject(s)
Electronic Health Records , Natural Language Processing , Smoking , Humans , Denmark/epidemiology , Electronic Health Records/statistics & numerical data , Smoking/epidemiology , Machine Learning , Female , Male , Middle Aged , Neural Networks, Computer
2.
Sensors (Basel) ; 24(8)2024 Apr 14.
Article in English | MEDLINE | ID: mdl-38676136

ABSTRACT

The accurate estimation of energy expenditure from simple objective accelerometry measurements provides a valuable method for investigating the effect of physical activity (PA) interventions or population surveillance. Methods have been evaluated previously, but none utilize the temporal aspects of the accelerometry data. In this study, we investigated the energy expenditure prediction from acceleration measured at the subjects' hip, wrist, thigh, and back using recurrent neural networks utilizing temporal elements of the data. The acceleration was measured in children (N = 33) performing a standardized activity protocol in their natural environment. The energy expenditure was modelled using Multiple Linear Regression (MLR), stacked long short-term memory (LSTM) networks, and combined convolutional neural networks (CNN) and LSTM. The correlation and mean absolute percentage error (MAPE) were 0.76 and 19.9% for the MLR, 0.882 and 0.879 and 14.22% for the LSTM, and, with the combined LSTM-CNN, the best performance of 0.883 and 13.9% was achieved. The prediction error for vigorous intensities was significantly different (p < 0.01) from those of the other intensity domains: sedentary, light, and moderate. Utilizing the temporal elements of movement significantly improves energy expenditure prediction accuracy compared to other conventional approaches, but the prediction error for vigorous intensities requires further investigation.


Subject(s)
Accelerometry , Energy Metabolism , Neural Networks, Computer , Humans , Accelerometry/methods , Energy Metabolism/physiology , Male , Female , Child , Exercise/physiology , Linear Models , Memory, Short-Term/physiology
3.
BMC Bioinformatics ; 24(1): 329, 2023 Sep 02.
Article in English | MEDLINE | ID: mdl-37658294

ABSTRACT

BACKGROUND: Alcohol use disorder (AUD) causes significant morbidity, mortality, and injuries. According to reports, approximately 5% of all registered deaths in Denmark could be due to AUD. The problem is compounded by the late identification of patients with AUD, a situation that can cause enormous problems, from psychological to physical to economic problems. Many individuals suffering from AUD never undergo specialist treatment during their addiction due to obstacles such as taboo and the poor performance of current screening tools. Therefore, there is a lack of rapid intervention. This can be mitigated by the early detection of patients with AUD. A clinical decision support system (DSS) powered by machine learning (ML) methods can be used to diagnose patients' AUD status earlier. METHODS: This study proposes an effective AUD prediction model (AUDPM), which can be used in a DSS. The proposed model consists of four distinct components: (1) imputation to address missing values using the k-nearest neighbours approach, (2) recursive feature elimination with cross validation to select the most relevant subset of features, (3) a hybrid synthetic minority oversampling technique-edited nearest neighbour approach to remove noise and balance the distribution of the training data, and (4) an ML model for the early detection of patients with AUD. Two data sources, including a questionnaire and electronic health records of 2571 patients, were collected from Odense University Hospital in the Region of Southern Denmark for the AUD-Dataset. Then, the AUD-Dataset was used to build ML models. The results of different ML models, such as support vector machine, K-nearest neighbour, decision tree, random forest, and extreme gradient boosting, were compared. Finally, a combination of all these models in an ensemble learning approach was selected for the AUDPM. RESULTS: The results revealed that the proposed ensemble AUDPM outperformed other single models and our previous study results, achieving 0.96, 0.94, 0.95, and 0.97 precision, recall, F1-score, and accuracy, respectively. In addition, we designed and developed an AUD-DSS prototype. CONCLUSION: It was shown that our proposed AUDPM achieved high classification performance. In addition, we identified clinical factors related to the early detection of patients with AUD. The designed AUD-DSS is intended to be integrated into the existing Danish health care system to provide novel information to clinical staff if a patient shows signs of harmful alcohol use; in other words, it gives staff a good reason for having a conversation with patients for whom a conversation is relevant.


Subject(s)
Alcoholism , Decision Support Systems, Clinical , Humans , Alcoholism/diagnosis , Early Diagnosis , Cluster Analysis , Electronic Health Records
4.
Sensors (Basel) ; 23(2)2023 Jan 06.
Article in English | MEDLINE | ID: mdl-36679471

ABSTRACT

Walking ability of elderly individuals, who suffer from walking difficulties, is limited, which restricts their mobility independence. The physical health and well-being of the elderly population are affected by their level of physical activity. Therefore, monitoring daily activities can help improve the quality of life. This becomes especially a huge challenge for those, who suffer from dementia and Alzheimer's disease. Thus, it is of great importance for personnel in care homes/rehabilitation centers to monitor their daily activities and progress. Unlike normal subjects, it is required to place the sensor on the back of this group of patients, which makes it even more challenging to detect walking from other activities. With the latest advancements in the field of health sensing and sensor technology, a huge amount of accelerometer data can be easily collected. In this study, a Machine Learning (ML) based algorithm was developed to analyze the accelerometer data collected from patients with walking difficulties, who live in one of the municipalities in Denmark. The ML algorithm is capable of accurately classifying the walking activity of these individuals with different walking abnormalities. Various statistical, temporal, and spectral features were extracted from the time series data collected using an accelerometer sensor placed on the back of the participants. The back sensor placement is desirable in patients with dementia and Alzheimer's disease since they may remove visible sensors to them due to the nature of their diseases. Then, an evolutionary optimization algorithm called Particle Swarm Optimization (PSO) was used to select a subset of features to be used in the classification step. Four different ML classifiers such as k-Nearest Neighbors (kNN), Random Forest (RF), Stacking Classifier (Stack), and Extreme Gradient Boosting (XGB) were trained and compared on an accelerometry dataset consisting of 20 participants. These models were evaluated using the leave-one-group-out cross-validation (LOGO-CV) technique. The Stack model achieved the best performance with average sensitivity, positive predictive values (precision), F1-score, and accuracy of 86.85%, 93.25%, 88.81%, and 93.32%, respectively, to classify walking episodes. In general, the empirical results confirmed that the proposed models are capable of classifying the walking episodes despite the challenging sensor placement on the back of the patients, who suffer from walking disabilities.


Subject(s)
Alzheimer Disease , Humans , Aged , Quality of Life , Walking , Gait , Machine Learning
5.
Transl Lung Cancer Res ; 12(12): 2392-2411, 2023 Dec 26.
Article in English | MEDLINE | ID: mdl-38205206

ABSTRACT

Background: Lung cancer (LC) is the leading cause of cancer related deaths, and several countries are implementing screening programs. Risk models have been introduced to refine the LC screening criteria, but the use of real-world data for this task demands a robust data infrastructure and quality. In this retrospective cohort study, we aim to address the different relevant risk factors in terms of data sources, descriptive statistics, completeness and quality. Methods: Data on comorbidity, prescription medication, smoking history, consultations, symptoms, familial predispositions, exposures, laboratory data among others were collected for all patients examined on a risk of LC over a 10-year period in the Region of Southern Denmark. Data were delivered from the regional data warehouse as well as the Danish Lung Cancer Registry. Associations between LC and non-LC groups were examined through Chi-squared test (categorical variables) and Wilcoxon signed-rank test (continuous variables that were non-parametric). These associations were investigated on both the original datasets and the subset of patients with complete data. Results: The number of examined individuals increased over the study period and more patients were diagnosed with LC in stage I-II, from 18% in 2009 to 31% in 2018. LC patients were more likely to be older, smoker, with a registered prescription of the included medication. They also exhibited differences in laboratory analysis indicating inflammation and hyponatremia. Weight loss, fatigue and pain were more prevalent in the LC group, while hemoptysis and fever were more common among the non-LC patients. Advanced-stage LC patients experienced a higher rate of symptoms compared to those in the low stages. Within the sub-cohort with complete dataset results, most observed trends persisted, although data on comorbidities were susceptibility to change. Conclusions: This study provides key insights into LC risk assessment using a robust dataset of patients examined for suspected LC. A consistent positive trend in early-stage LC diagnosis was observed throughout the study period. LC patients exhibited distinct smoking behaviors, medication patterns, variations in lab results, and specific symptoms. These discoveries have the potential to enhance discrimination in machine learning-based prediction models, particularly those capable of handling complex distributions. Serving as a detailed account of real-world data collection and processing, the study establishes a foundation for future development of prediction models aimed at facilitating the early referral of LC patients.

6.
Int J Med Inform ; 163: 104790, 2022 07.
Article in English | MEDLINE | ID: mdl-35552189

ABSTRACT

BACKGROUND: Atrial fibrillation (AF) is one of the most prevalent cardiac arrhythmias, which challenges the healthcare systems globally.Timely detection of AF can potentially reduce the mortality and morbidity rates as well as alleviate the economic burden caused by this.Digital solutions are shown to enhance the diagnosis of cardiac abnormalities. OBJECTIVES: By the latest advancements in the field of medical informatics and tele-health monitoring, huge amount of electro-physiological signals, such as electrocardiograms (ECG), can be easily collected.One of the most common ways for physicians/cardiologists to analyse these signals is through visual inspection.However, it is not always easy and in most cases cumbersome to analyse these big amounts of ECG data.Therefore, it is of great interest to develop models that are capable of analyzing these data and help physicians making better decisions.This paper proposes and compares well-known machine learning (ML) algorithms to diagnose short episodes of AF. This also paves the way for real-time detection of AF in clinical settings. METHODS: Different ML algorithms such as Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Stacking Classifier (SC), Extreme Gradient Boosting (XGBoost), and Adaptive Boosting (AdaBoost) were applied to detect AF. These models were trained using extracted statistical features from ECG signals. RESULTS: The proposed ML models were trained on a dataset with 23 ECG records of length approximately 10 h each using leave one group out cross validation (LOGO-CV) technique and achieved the best sensitivity (Se), specificity (Sp), positive predictive value (PPV), false positive rate (FPR), and F1-score of 85.67%, 81.25%, 90.85%, 18.75% and 88.18%, respectively, to classify AF from normal sinus rhythms (NSR) in short ECG segments of 20 heartbeats.Additionally, the models were examined on three unseen datasets, namely the Long Term AF dataset, MIT-BIH Arrhythmia dataset, and MIT-BIH Normal Sinus Rhythm dataset, to assess their robustness and generalization. CONCLUSION: The obtained results show high performance and flexibility of some of the applied ML models compared to other well-known algorithms. In general, the empirical results confirm that ensemble methods, such as AdaBoost, generalized well and perform better than other approaches.


Subject(s)
Atrial Fibrillation , Algorithms , Atrial Fibrillation/diagnosis , Electrocardiography/methods , Heart Rate , Humans , Support Vector Machine
7.
Comput Methods Programs Biomed ; 221: 106899, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35640394

ABSTRACT

BACKGROUND: State-of-the-art automatic atrial fibrillation (AF) detection models trained on RR-interval (RRI) features generally produce high performance on standard benchmark electrocardiogram (ECG) AF datasets. These models, however, result in a significantly high false positive rates (FPRs) when applied on ECG data collected under free-living ambulatory conditions and in the presence of non-AF arrhythmias. METHOD: This paper proposes DeepAware, a novel hybrid model combining deep learning (DL) and context-aware heuristics (CAH), which reduces the FPR effectively and improves the AF detection performance on participant-operated ambulatory ECG from free-living conditions. It exploits the RRI and P-wave features, as well as the contextual features from the ambulatory ECG. RESULTS: DeepAware is shown to be very generalizable and superior to the state-of-the-art models when applied on unseen benchmark ECG AF datasets. Most importantly, the model is able to detect AF efficiently when applied on participant-operated ambulatory ECG recordings from free-living conditions and has achieved a sensitivity (Se), specificity (Sp), and accuracy (Acc) of 97.94%, 98.39%, 98.06%, respectively. Results also demonstrate the effect of atrial activity analysis (via P-waves detection) and CAH in reducing the FPR over the RRI features-based AF detection model. CONCLUSIONS: The proposed DeepAware model can substantially reduce the physician's workload of manually reviewing the false positives (FPs) and facilitate long-term ambulatory monitoring for early detection of AF.


Subject(s)
Atrial Fibrillation , Deep Learning , Algorithms , Atrial Fibrillation/diagnosis , Electrocardiography/methods , Electrocardiography, Ambulatory , Heuristics , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...