Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
Article in English | MEDLINE | ID: mdl-35886298

ABSTRACT

The lung cancer threat has become a critical issue for public health. Research has been devoted to its clinical study but only a few studies have addressed the issue from a holistic perspective that included social, economic, and environmental dimensions. Therefore, in this study, risk factors or features, such as air pollution, tobacco use, socioeconomic status, employment status, marital status, and environment, were comprehensively considered when constructing a predictive model. These risk factors were analyzed and selected using stepwise regression and the variance inflation factor to eliminate the possibility of multicollinearity. To build efficient and informative prediction models of lung cancer incidence rates, several machine learning algorithms with cross-validation were adopted, namely, linear regression, support vector regression, random forest, K-nearest neighbor, and cubist model tree. A case study in Taiwan showed that the cubist model tree with feature selection was the best model with an RMSE of 3.310 and an R-squared of 0.960. Through these predictive models, we also found that apart from smoking, the average NO2 concentration, employment percentage, and number of factories were also important factors that had significant impacts on the incidence of lung cancer. In addition, the random forest model without feature selection and with feature selection could support the interpretation of the most contributing variables. The predictive model proposed in the present study can help to precisely analyze and estimate lung cancer incidence rates so that effective preventative measures can be developed. Furthermore, the risk factors involved in the predictive model can help with the future analysis of lung cancer incidence rates from a holistic perspective.


Subject(s)
Air Pollution , Lung Neoplasms , Air Pollution/adverse effects , Air Pollution/analysis , Algorithms , Benchmarking , Humans , Incidence , Lung Neoplasms/epidemiology , Machine Learning
2.
Comput Biol Med ; 138: 104888, 2021 11.
Article in English | MEDLINE | ID: mdl-34610552

ABSTRACT

BACKGROUND: There is an increasing number of patients with a first primary cancer who are diagnosed with a second primary cancer, but prognosis methods to predict the survivability of a patient with multiple primary cancers have not been fully benchmarked. METHODS: This study investigated the five-year survivability prognosis performances of six machine learning approaches. These approaches are: artificial neural network, decision tree (DT), logistic regression, support vector machine, naïve Bayes (NB), and Bayesian network (BN). A synthetic minority over-sampling technique (SMOTE) was used to solve the imbalanced problem, and a nationwide cancer patient database containing 7,845 subjects in Taiwan was used as a sample source. Ten primary and secondary cancers and their key variables affecting the survivability of the patients were identified. RESULTS: All the models using SMOTE improved sensitivity and specificity significantly. NB has the highest performance in terms of accuracy and specificity, whereas BN has the highest performance in terms of sensitivity. Further, the computational time and the power of knowledge representation of NB, BN, and DT outperformed the others. CONCLUSIONS: Selecting the appropriate prognosis models to predict survivability of patients with two contingent primary cancers can aid precise prediction and can support appropriate treatment advice.


Subject(s)
Benchmarking , Neoplasms , Bayes Theorem , Humans , Logistic Models , Neural Networks, Computer , Support Vector Machine
3.
Comput Methods Programs Biomed ; 196: 105686, 2020 Nov.
Article in English | MEDLINE | ID: mdl-32777652

ABSTRACT

BACKGROUND AND OBJECTIVE: Multiple primary cancers significantly threat patient survivability. Predicting the survivability of patients with two cancers is challenging because its stochastic pattern relates with numerous variables. METHODS: In this study, a Bayesian network (BN) model was proposed to describe the occurrence of two primary cancers and predict the five-year survivability of patients using probabilistic evidence. Eleven types of major primary cancers and contingent occurrences of secondary cancers were investigated. A nationwide two-cancer database involving 7,845 patients in Taiwan was investigated. The BN topology is rigorously examined and imbalanced dataset is processed by the synthetic minority oversampling technique. The proposed BN survivability prognosis model was compared with benchmark approaches. RESULTS: The proposed model significantly outperformed the back-propagation neural network, logistic regression, support vector machine, and naïve Bayes in terms of sensitivity, which is a critical performance index for the non-survival group. CONCLUSIONS: Using the proposed BN model, one can estimate the posterior probabilities for every query provided appropriate prior evidences. The potential survivability information of patients, treatment effects, and socio-demographics factor effects predicted by the proposed model can help in cancer treatment assessment and cancer development monitoring.


Subject(s)
Neoplasms , Neural Networks, Computer , Bayes Theorem , Humans , Logistic Models , Neoplasms/epidemiology , Prognosis , Taiwan/epidemiology
4.
Article in English | MEDLINE | ID: mdl-32188138

ABSTRACT

BACKGROUND: Most stroke cases lead to serious mental and physical disabilities, such as dementia and sensory impairment. Chronic diseases are contributory risk factors for stroke. However, few studies considered the transition behaviors of stroke to dementia associated with chronic diseases and environmental risks. OBJECTIVE: This study aims to develop a prognosis model to address the issue of stroke transitioning to dementia associated with environmental risks. DESIGN: This cohort study used the data from the National Health Insurance Research Database in Taiwan. SETTING: Healthcare data were obtained from more than 25 million enrollees and covered over 99% of Taiwan's entire population. PARTICIPANTS: In this study, 10,627 stroke patients diagnosed from 2000 to 2010 in Taiwan were surveyed. METHODS: A Cox regression model and corresponding semi-Markov process were constructed to evaluate the influence of risk factors on stroke, corresponding dementia, and their transition behaviors. MAIN OUTCOME MEASURE: Relative risk and sojourn time were the main outcome measure. RESULTS: Multivariate analysis showed that certain environmental risks, medication, and rehabilitation factors highly influenced the transition of stroke from a chronic disease to dementia. This study also highlighted the high-risk populations of stroke patients against the environmental risk factors; the males below 65 years old were the most sensitive population. CONCLUSION: Experiments showed that the proposed semi-Markovian model outperformed other benchmark diagnosis algorithms (i.e., linear regression, decision tree, random forest, and support vector machine), with a high R2 of 90%. The proposed model also facilitated an accurate prognosis on the transition time of stroke from chronic diseases to dementias against environmental risks and rehabilitation factors.


Subject(s)
Dementia , Environmental Pollutants , Stroke , Aged , Cohort Studies , Dementia/epidemiology , Environmental Pollutants/toxicity , Female , Humans , Male , Risk Factors , Stroke/epidemiology , Taiwan
5.
J Med Syst ; 44(3): 65, 2020 Feb 10.
Article in English | MEDLINE | ID: mdl-32040648

ABSTRACT

Lung cancer is a major reason of mortalities. Estimating the survivability for this disease has become a key issue to families, hospitals, and countries. A conditional Gaussian Bayesian network model was presented in this study. This model considered 15 risk factors to predict the survivability of a lung cancer patient at 4 severity stages. We surveyed 1075 patients. The presented model is constructed by using the demographic, diagnosed-based, and prior-utilization variables. The proposed model for the survivability prognosis at different four stages performed R2 of 93.57%, 86.83%, 67.22%, and 52.94%, respectively. The model predicted the lung cancer survivability with high accuracy compared with the reported models. Our model also shows that it reached the ceiling of an ideal Bayesian network.


Subject(s)
Cancer Survivors/statistics & numerical data , Lung Neoplasms/mortality , Severity of Illness Index , Bayes Theorem , Databases, Factual/statistics & numerical data , Female , Humans , Male , Models, Biological , Prognosis , Survival Analysis
6.
Comput Biol Med ; 106: 97-105, 2019 03.
Article in English | MEDLINE | ID: mdl-30708222

ABSTRACT

Lung cancer is one of the leading causes of mortality, and its medical expenditure has increased dramatically. Estimating the expenditure for this disease has become an urgent concern of the supporting families, medial institutes, and government. In this study, a conditional Gaussian Bayesian network (CGBN) model was developed to incorporate the comprehensive risk factors to estimate the medical expenditure of a lung cancer patient at different stages. A total of 961 patients were surveyed by the four severity stages of lung cancer. The proposed CGBN model identified the correlation and association of 15 risk factors to the medical expenditure of different severity stages of lung cancer patients. The relationships among the demographic, diagnosed-based, and prior-utilization variables are constructed. The model predicted the lung cancer-related medical expenditure with high accuracy of 32.63%, 50.30%, 50.36%, and 66.58%, respectively for stages 1-4, as compared with the reported models. A greedy search was also applied to find the upper threshold of R2, while our model also shows that it approached the upper threshold.


Subject(s)
Health Expenditures , Lung Neoplasms/economics , Models, Economic , Aged , Bayes Theorem , Female , Humans , Lung Neoplasms/diagnosis , Lung Neoplasms/therapy , Male , Middle Aged , Neoplasm Staging , Retrospective Studies
7.
Biomed Res Int ; 2018: 1252897, 2018.
Article in English | MEDLINE | ID: mdl-30519567

ABSTRACT

The effect of comorbidity on lung cancer patients' survival has been widely reported. The aim of this study was to investigate the effects of comorbidity on the establishment of the diagnosis of lung cancer and survival in lung cancer patients in Taiwan by using a nationwide population-based study design. This study collected various comorbidity patients and analyzed data regarding the lung cancer diagnosis and survival during a 16-year follow-up period (1995-2010). In total, 101,776 lung cancer patients were included, comprising 44,770 with and 57,006 without comorbidity. The Kaplan-Meier analyses were used to compare overall survival between lung cancer patients with and without comorbidity. In our cohort, chronic bronchitis patients who developed lung cancer had the lowest overall survival in one (45%), five (28.6%), and ten years (26.2%) since lung cancer diagnosis. Among lung cancer patients with nonpulmonary comorbidities, patients with hypertension had the lowest overall survival in one (47.9%), five (30.5%), and ten (28.2%) years since lung cancer diagnosis. In 2010, patients with and without comorbidity had 14.86 and 9.31 clinical visits, respectively. Lung cancer patients with preexisting comorbidity had higher frequency of physician visits. The presence of comorbid conditions was associated with early diagnosis of lung cancer.


Subject(s)
Lung Diseases/diagnosis , Lung Diseases/mortality , Lung Neoplasms/diagnosis , Lung Neoplasms/mortality , Adult , Aged , Cohort Studies , Comorbidity , Disease-Free Survival , Female , Humans , Kaplan-Meier Estimate , Lung Diseases/complications , Lung Diseases/pathology , Lung Neoplasms/complications , Lung Neoplasms/pathology , Male , Middle Aged , Risk Assessment , Risk Factors , Taiwan/epidemiology
8.
J Thorac Dis ; 8(Suppl 3): S272-8, 2016 Mar.
Article in English | MEDLINE | ID: mdl-27014474

ABSTRACT

BACKGROUND: Comparison of the degree of postoperative pain associated with different thoracoscopic surgical techniques for spontaneous pneumothorax has never reported. In this study we compared perioperative outcomes and degrees of postoperative pain associated with single-incision subxiphoid thoracoscopic surgery, single-incision transthoracic thoracoscopic surgery and three-incision transthoracic thoracoscopic surgery for spontaneous pneumothorax. METHODS: During the period August 2013 to September 2015, fifty-seven consecutive patients with spontaneous pneumothorax were treated via single-incision subxiphoid thoracoscopic surgery, single-incision transthoracic thoracoscopic surgery or three-incision transthoracic thoracoscopic surgery. Demographic data, operative time, operative blood loss, length of hospital stay, duration of chest tube drainage, postoperative complications, and numeric pain rating scale scores were collected from the medical records for analysis. RESULTS: Among the 57 patients, 14 received single-incision subxiphoid thoracoscopic surgery, 26 underwent single-incision transthoracic surgery and 17 received three-incision thoracoscopic surgery. In all patients, surgeries were completed without the need for conversion to open surgery. Patients who underwent the single-incision subxiphoid procedure had significantly lower 1-, 8-, 24- and 32-hour postoperative pain scale scores than patients who underwent the other two procedures. The average and maximum pain scale scores during the first 24 hours were lowest in the single-incision subxiphoid group (P<0.0001). CONCLUSIONS: Single-incision subxiphoid thoracoscopic surgery is associated with significantly lower postoperative pain intensity than transthoracic approaches and therefore may provide an alternative surgical technique for patients with spontaneous pneumothorax.

9.
J Med Syst ; 40(1): 35, 2016 Jan.
Article in English | MEDLINE | ID: mdl-26573656

ABSTRACT

Brain metastases are commonly found in patients that are diagnosed with primary malignancy on their lung. Lung cancer patients with brain metastasis tend to have a poor survivability, which is less than 6 months in median. Therefore, an early and effective detection system for such disease is needed to help prolong the patients' survivability and improved their quality of life. A modified electromagnetism-like mechanism (EM) algorithm, MEM-SVM, is proposed by combining EM algorithm with support vector machine (SVM) as the classifier and opposite sign test (OST) as the local search technique. The proposed method is applied to 44 UCI and IDA datasets, and 5 cancers microarray datasets as preliminary experiment. In addition, this method is tested on 4 lung cancer microarray public dataset. Further, we tested our method on a nationwide dataset of brain metastasis from lung cancer (BMLC) in Taiwan. Since the nature of real medical dataset to be highly imbalanced, the synthetic minority over-sampling technique (SMOTE) is utilized to handle this problem. The proposed method is compared against another 8 popular benchmark classifiers and feature selection methods. The performance evaluation is based on the accuracy and Kappa index. For the 44 UCI and IDA datasets and 5 cancer microarray datasets, a non-parametric statistical test confirmed that MEM-SVM outperformed the other methods. For the 4 lung cancer public microarray datasets, MEM-SVM still achieved the highest mean value for accuracy and Kappa index. Due to the imbalanced property on the real case of BMLC dataset, all methods achieve good accuracy without significance difference among the methods. However, on the balanced BMLC dataset, MEM-SVM appears to be the best method with higher accuracy and Kappa index. We successfully developed MEM-SVM to predict the occurrence of brain metastasis from lung cancer with the combination of SMOTE technique to handle the class imbalance properties. The results confirmed that MEM-SVM has good diagnosis power and can be applied as an alternative diagnosis tool in with other medical tests for the early detection of brain metastasis from lung cancer.


Subject(s)
Algorithms , Brain Neoplasms/diagnosis , Brain Neoplasms/secondary , Lung Neoplasms/pathology , Support Vector Machine , Aged , Female , Humans , Male , Middle Aged , Taiwan
10.
Comput Methods Programs Biomed ; 119(2): 63-76, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25823851

ABSTRACT

Classifying imbalanced data in medical informatics is challenging. Motivated by this issue, this study develops a classifier approach denoted as BSMAIRS. This approach combines borderline synthetic minority oversampling technique (BSM) and artificial immune recognition system (AIRS) as global optimization searcher with the nearest neighbor algorithm used as a local classifier. Eight electronic medical datasets collected from University of California, Irvine (UCI) machine learning repository were used to evaluate the effectiveness and to justify the performance of the proposed BSMAIRS. Comparisons with several well-known classifiers were conducted based on accuracy, sensitivity, specificity, and G-mean. Statistical results concluded that BSMAIRS can be used as an efficient method to handle imbalanced class problems. To further confirm its performance, BSMAIRS was applied to real imbalanced medical data of lung cancer metastasis to the brain that were collected from National Health Insurance Research Database, Taiwan. This application can function as a supplementary tool for doctors in the early diagnosis of brain metastasis from lung cancer.


Subject(s)
Algorithms , Brain Neoplasms/secondary , Lung Neoplasms/pathology , Humans , Taiwan
11.
Comput Methods Programs Biomed ; 119(3): 142-62, 2015 May.
Article in English | MEDLINE | ID: mdl-25804445

ABSTRACT

The prediction of substantially short survivability in patients is extremely risky. In this study, we proposed a probabilistic model using Bayesian network (BN) to predict the short survivability of patients with brain metastasis from lung cancer. A nationwide cancer patient database from 1996 to 2010 in Taiwan was used. The cohort consisted of 438 patients with brain metastasis from lung cancer. We utilized synthetic minority over-sampling technique (SMOTE) to solve the imbalanced property embedded in the problem. The proposed BN was compared with three competitive models, namely, naive Bayes (NB), logistic regression (LR), and support vector machine (SVM). Statistical analysis showed that performances of BN, LR, NB, and SVM were statistically the same in terms of all indices with low sensitivity when these models were applied on an imbalanced data set. Results also showed that SMOTE can improve the performance of the four models in terms of sensitivity, while keeping high accuracy and specificity. Further, the proposed BN is more effective as compared with NB, LR, and SVM from two perspectives: the transparency and ability to show the relation of factors affecting brain metastasis from lung cancer; it allows decision makers to find the probability despite incomplete evidence and information; and the sensitivity of the proposed BN is the highest among all standard machine learning methods.


Subject(s)
Brain Neoplasms/secondary , Lung Neoplasms , Models, Statistical , Aged , Bayes Theorem , Brain Neoplasms/mortality , Databases, Factual/statistics & numerical data , Female , Humans , Logistic Models , Male , Middle Aged , Models, Biological , Prognosis , Support Vector Machine , Survival Analysis , Taiwan/epidemiology
12.
J Biomed Inform ; 54: 220-9, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25677947

ABSTRACT

Recently, the use of artificial intelligence based data mining techniques for massive medical data classification and diagnosis has gained its popularity, whereas the effectiveness and efficiency by feature selection is worthy to further investigate. In this paper, we presents a novel method for feature selection with the use of opposite sign test (OST) as a local search for the electromagnetism-like mechanism (EM) algorithm, denoted as improved electromagnetism-like mechanism (IEM) algorithm. Nearest neighbor algorithm is served as a classifier for the wrapper method. The proposed IEM algorithm is compared with nine popular feature selection and classification methods. Forty-six datasets from the UCI repository and eight gene expression microarray datasets are collected for comprehensive evaluation. Non-parametric statistical tests are conducted to justify the performance of the methods in terms of classification accuracy and Kappa index. The results confirm that the proposed IEM method is superior to the common state-of-art methods. Furthermore, we apply IEM to predict the occurrence of Type 2 diabetes mellitus (DM) after a gestational DM. Our research helps identify the risk factors for this disease; accordingly accurate diagnosis and prognosis can be achieved to reduce the morbidity and mortality rate caused by DM.


Subject(s)
Algorithms , Data Mining/methods , Diabetes Mellitus, Type 2/diagnosis , Diagnosis, Computer-Assisted/methods , Databases, Factual , Electromagnetic Fields , Humans , Models, Theoretical , Pattern Recognition, Automated , Risk Factors
13.
Comput Biol Med ; 47: 147-60, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24607682

ABSTRACT

The Bayesian network (BN) is a promising method for modeling cancer metastasis under uncertainty. BN is graphically represented using bioinformatics variables and can be used to support an informative medical decision/observation by using probabilistic reasoning. In this study, we propose such a BN to describe and predict the occurrence of brain metastasis from lung cancer. A nationwide database containing more than 50,000 cases of cancer patients from 1996 to 2010 in Taiwan was used in this study. The BN topology for studying brain metastasis from lung cancer was rigorously examined by domain experts/doctors. We used three statistical measures, namely, the accuracy, sensitivity, and specificity, to evaluate the performances of the proposed BN model and to compare it with three competitive approaches, namely, naive Bayes (NB), logistic regression (LR) and support vector machine (SVM). Experimental results show that no significant differences are observed in accuracy or specificity among the four models, while the proposed BN outperforms the others in terms of sampled average sensitivity. Moreover the proposed BN has advantages compared with the other approaches in interpreting how brain metastasis develops from lung cancer. It is shown to be easily understood by physicians, to be efficient in modeling non-linear situations, capable of solving stochastic medical problems, and handling situations wherein information are missing in the context of the occurrence of brain metastasis from lung cancer.


Subject(s)
Bayes Theorem , Brain Neoplasms/secondary , Computational Biology/methods , Lung Neoplasms/pathology , Aged , Algorithms , Brain Neoplasms/epidemiology , Female , Humans , Lung Neoplasms/epidemiology , Male , Middle Aged , Models, Statistical , Sensitivity and Specificity , Taiwan/epidemiology
14.
BMC Bioinformatics ; 15: 49, 2014 Feb 20.
Article in English | MEDLINE | ID: mdl-24555567

ABSTRACT

BACKGROUND: In the application of microarray data, how to select a small number of informative genes from thousands of genes that may contribute to the occurrence of cancers is an important issue. Many researchers use various computational intelligence methods to analyzed gene expression data. RESULTS: To achieve efficient gene selection from thousands of candidate genes that can contribute in identifying cancers, this study aims at developing a novel method utilizing particle swarm optimization combined with a decision tree as the classifier. This study also compares the performance of our proposed method with other well-known benchmark classification methods (support vector machine, self-organizing map, back propagation neural network, C4.5 decision tree, Naive Bayes, CART decision tree, and artificial immune recognition system) and conducts experiments on 11 gene expression cancer datasets. CONCLUSION: Based on statistical analysis, our proposed method outperforms other popular classifiers for all test datasets, and is compatible to SVM for certain specific datasets. Further, the housekeeping genes with various expression patterns and tissue-specific genes are identified. These genes provide a high discrimination power on cancer classification.


Subject(s)
Algorithms , Computational Biology/methods , Decision Trees , Gene Expression Profiling/methods , Neoplasms/genetics , Artificial Intelligence , Bayes Theorem , Databases, Factual , Female , Humans , Male , Neoplasms/classification , Neoplasms/metabolism , Reproducibility of Results , Support Vector Machine
15.
BMC Med Inform Decis Mak ; 13: 124, 2013 Nov 09.
Article in English | MEDLINE | ID: mdl-24207108

ABSTRACT

BACKGROUND: Breast cancer is one of the most critical cancers and is a major cause of cancer death among women. It is essential to know the survivability of the patients in order to ease the decision making process regarding medical treatment and financial preparation. Recently, the breast cancer data sets have been imbalanced (i.e., the number of survival patients outnumbers the number of non-survival patients) whereas the standard classifiers are not applicable for the imbalanced data sets. The methods to improve survivability prognosis of breast cancer need for study. METHODS: Two well-known five-year prognosis models/classifiers [i.e., logistic regression (LR) and decision tree (DT)] are constructed by combining synthetic minority over-sampling technique (SMOTE), cost-sensitive classifier technique (CSC), under-sampling, bagging, and boosting. The feature selection method is used to select relevant variables, while the pruning technique is applied to obtain low information-burden models. These methods are applied on data obtained from the Surveillance, Epidemiology, and End Results database. The improvements of survivability prognosis of breast cancer are investigated based on the experimental results. RESULTS: Experimental results confirm that the DT and LR models combined with SMOTE, CSC, and under-sampling generate higher predictive performance consecutively than the original ones. Most of the time, DT and LR models combined with SMOTE and CSC use less informative burden/features when a feature selection method and a pruning technique are applied. CONCLUSIONS: LR is found to have better statistical power than DT in predicting five-year survivability. CSC is superior to SMOTE, under-sampling, bagging, and boosting to improve the prognostic performance of DT and LR.


Subject(s)
Breast Neoplasms/mortality , Models, Statistical , Prognosis , Adult , Classification/methods , Decision Trees , Disease-Free Survival , Female , Humans , Logistic Models
SELECTION OF CITATIONS
SEARCH DETAIL
...