Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Sensors (Basel) ; 22(18)2022 Sep 08.
Article in English | MEDLINE | ID: mdl-36146145

ABSTRACT

Although lung cancer survival status and survival length predictions have primarily been studied individually, a scheme that leverages both fields in an interpretable way for physicians remains elusive. We propose a two-phase data analytic framework that is capable of classifying survival status for 0.5-, 1-, 1.5-, 2-, 2.5-, and 3-year time-points (phase I) and predicting the number of survival months within 3 years (phase II) using recent Surveillance, Epidemiology, and End Results data from 2010 to 2017. In this study, we employ three analytical models (general linear model, extreme gradient boosting, and artificial neural networks), five data balancing techniques (synthetic minority oversampling technique (SMOTE), relocating safe level SMOTE, borderline SMOTE, adaptive synthetic sampling, and majority weighted minority oversampling technique), two feature selection methods (least absolute shrinkage and selection operator (LASSO) and random forest), and the one-hot encoding approach. By implementing a comprehensive data preparation phase, we demonstrate that a computationally efficient and interpretable method such as GLM performs comparably to more complex models. Moreover, we quantify the effects of individual features in phase I and II by exploiting GLM coefficients. To the best of our knowledge, this study is the first to (a) implement a comprehensive data processing approach to develop performant, computationally efficient, and interpretable methods in comparison to black-box models, (b) visualize top factors impacting survival odds by utilizing the change in odds ratio, and (c) comprehensively explore short-term lung cancer survival using a two-phase approach.


Subject(s)
Lung Neoplasms , Humans , Linear Models , Neural Networks, Computer
2.
Int J Med Inform ; 149: 104438, 2021 05.
Article in English | MEDLINE | ID: mdl-33730681

ABSTRACT

BACKGROUND: Despite the increasing number of studies in breast cancer survival prediction, there is little attention put toward deceased patients and their survival lengths. Moreover, developing a model that is both accurate and interpretable remains a challenge. OBJECTIVE: This paper proposes a two-stage data analytic framework, where Stage I classifies the survival and deceased statuses and Stage II predicts the number of survival months for deceased females with cancer. Since medical data are not entirely clean nor prepared for model development, we aim to show that data preparation can strengthen a simple Generalized Linear Model (GLM)1 to predict as accurate as the complex models like Extreme Gradient Boosting (XGB)2 and Multilayer Perceptron based on Artificial Neural Networks (MLP-ANNs)3 in both stages. METHODS: In Stage I, we use recent Surveillance, Epidemiology, and End Results (SEER)4 data from 2004 to 2016 to predict short term survival statuses from 6-months to 3-years with 6 month increments. Synthetic Minority Over-sampling Technique (SMOTE),5 Relocating Safe-Level SMOTE (RSLS)6, Adaptive Synthetic (ADASYN)7 re-sampling techniques, Least Absolute Shrinkage and Selection Operator (LASSO)8 and Random Forest (RF)9 feature selection methods along with integer and one-hot encoding are combined with the three popular data mining methods: GLM, XGB, and MLP. In Stage II, we predict the number of survival months for patients who are correctly predicted as deceased within 3-years. Again, we employ GLM, XGB, and MLP for regression along with LASSO and RF for feature selection and one-hot encoding to encode the categorical features. RESULTS: We obtain Area Under the Receiver Operating Characteristic Curve (AUC)10 values of 0.900, 0.898, 0.877, 0.852, 0.852, and 0.858 for 6-month, 1-, 1.5-, 2-, 2.5, and 3-year survival time-points, respectively, using OneHotEncoding-GLM-LASSO-ADASYN. We use the change in the Odds Ratio values in GLM to manifest the impact of individual categorical levels and numerical features on the odds of death. In Stage II, we obtain Mean Absolute Error (MAE)11 of 7.960 months using OneHotEncoding-GLM-LASSO when predicting the number of survival months for deceased patients. We present the top contributing features and their coefficient values to illustrate how the presence of each feature alters the predicted number of survival months. CONCLUSION: To the best of our knowledge, this is the first study that implements both breast cancer survival classification and regression in a two-stage approach. All data-driven findings are presented in order to assist clinicians make better care decisions using GLM, an interpretable and computationally efficient method that predicts survival status and survival lengths for deceased patients, to help foster human and machine interactions.


Subject(s)
Breast Neoplasms , Machine Learning , Female , Humans , Linear Models , ROC Curve
3.
Appl Ergon ; 65: 515-529, 2017 Nov.
Article in English | MEDLINE | ID: mdl-28259238

ABSTRACT

Wearable sensors are currently being used to manage fatigue in professional athletics, transportation and mining industries. In manufacturing, physical fatigue is a challenging ergonomic/safety "issue" since it lowers productivity and increases the incidence of accidents. Therefore, physical fatigue must be managed. There are two main goals for this study. First, we examine the use of wearable sensors to detect physical fatigue occurrence in simulated manufacturing tasks. The second goal is to estimate the physical fatigue level over time. In order to achieve these goals, sensory data were recorded for eight healthy participants. Penalized logistic and multiple linear regression models were used for physical fatigue detection and level estimation, respectively. Important features from the five sensors locations were selected using Least Absolute Shrinkage and Selection Operator (LASSO), a popular variable selection methodology. The results show that the LASSO model performed well for both physical fatigue detection and modeling. The modeling approach is not participant and/or workload regime specific and thus can be adopted for other applications.


Subject(s)
Biosensing Techniques/instrumentation , Fatigue/diagnosis , Occupational Diseases/diagnosis , Wearable Electronic Devices , Work/physiology , Adolescent , Adult , Fatigue/etiology , Female , Humans , Linear Models , Male , Manufacturing Industry , Middle Aged , Multivariate Analysis , Occupational Diseases/etiology , Workplace , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...