Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
PLoS One ; 18(11): e0290629, 2023.
Article in English | MEDLINE | ID: mdl-37917635

ABSTRACT

The hotel industry is essential for tourism. With the rapid expansion of the internet, consumers only search for their desired keywords on the website when they trying to find a hotel to stay, causing the relevant hotel information would appear. To quickly respond to the changing market and consumer habits, each hotel must focus on its website information and information quality. This study proposes a novel methodology that uses rough set theory (RST), principal component analysis, t-Distributed Stochastic Neighbor Embedding (t-SNE), and attribute performance visualization to explore the relationship between hotel star ratings and hotel website information quality. The collected data are based on the star-rated hotels of the Taiwanstay website, and the checklists of hotel website services are used to obtain the relevant attributes data. The results show that there are significant differences in information quality between hotels below two stars and those above four stars. The information quality provided by the higher star hotels was more detailed than that offered by low-star hotels. Based on the attribute performance matrix, the one-star and two-star hotels have advantage attributes in their landscape, reply time, restaurant information, social media, and compensation. Furthermore, the three-five star hotels have advantage attributes in their operational support, compensation, restaurant information, traffic information, and room information. These results could be provided to the stakeholders as a reference.


Subject(s)
Industry , Tourism , Humans
2.
PLoS One ; 17(8): e0272956, 2022.
Article in English | MEDLINE | ID: mdl-35994471

ABSTRACT

Road accidents are one of the primary causes of death worldwide; hence, they constitute an important research field. Taiwan is a small country with a high-density population. It particularly has a considerable number of locomotives. Furthermore, Taiwan's traffic accident fatality rate increased by 23.84% in 2019 compared with 2018, primarily because of human factors. Road safety has long been a challenging problem in Taiwanese cities. This study collected public data pertaining to traffic accidents from the Taoyuan city government in Taiwan and generated six datasets based on the various accident frequencies at the same location. To find key attributes, this study proposes a three-stage dimension reduction to filter attributes, which includes removing multicollinear attributes, the integrated attribute selection method, and statistical factor analysis. We applied five rule-based classifiers to classify six different frequency datasets and generate the rules of accident severity. The order of top ten key attributes was hit vehicle > certificate type > vehicle > action type > drive quality > escape > accident type > gender > job > trip purposes in the maximum accident frequency CF ≥ 10 dataset. When locomotives, bicycles, and people collide with other locomotives or trucks, injury or death can easily occur, and the motorcycle riders are at the highest risk. The findings of this study provide a reference for governments and stakeholders to reduce the road accident risk factors.


Subject(s)
Accidental Injuries , Accidents, Traffic/prevention & control , Humans , Motor Vehicles , Risk Factors , Taiwan/epidemiology
3.
Comput Biol Med ; 134: 104527, 2021 07.
Article in English | MEDLINE | ID: mdl-34091384

ABSTRACT

Most classification algorithms assume that classes are in a balanced state. However, datasets with class imbalances are everywhere. The classes of actual medical datasets are imbalanced, severely impacting identification models and even sacrificing the classification accuracy of the minority class, even though it is the most influential and representative. The medical field has irreversible characteristics. Its tolerance rate for misjudgment is relatively low, and errors may cause irreparable harm to patients. Therefore, this study proposes a multiple combined method to rebalance medical data featuring class imbalances. The combined methods include (1) resampling methods (synthetic minority oversampling technique [SMOTE] and undersampling [US]), (2) particle swarm optimization (PSO), and (3) MetaCost. This study conducted two experiments with nine medical datasets to verify and compare the proposed method with the listing methods. A decision tree is used to generate decision rules for easy understanding of the research results. The results show that (1) the proposed method with ensemble learning can improve the area under a receiver operating characteristic curve (AUC), recall, precision, and F1 metrics; (2) MetaCost can increase sensitivity; (3) SMOTE can effectively enhance AUC; (4) US can improve sensitivity, F1, and misclassification costs in data with a high-class imbalance ratio; and (5) PSO-based attribute selection can increase sensitivity and reduce data dimension. Finally, we suggest that the dataset with an imbalanced ratio >9 must use the US results to make the decision. As the imbalanced ratio is < 9, the decision-maker can simultaneously consider the results of SMOTE and US to identify the best decision.


Subject(s)
Algorithms , Research Design , Humans , Learning , ROC Curve
4.
Entropy (Basel) ; 22(12)2020 Dec 13.
Article in English | MEDLINE | ID: mdl-33322122

ABSTRACT

Since 2001, cardiovascular disease (CVD) has had the second-highest mortality rate, about 15,700 people per year, in Taiwan. It has thus imposed a substantial burden on medical resources. This study was triggered by the following three factors. First, the CVD problem reflects an urgent issue. A high priority has been placed on long-term therapy and prevention to reduce the wastage of medical resources, particularly in developed countries. Second, from the perspective of preventive medicine, popular data-mining methods have been well learned and studied, with excellent performance in medical fields. Thus, identification of the risk factors of CVD using these popular techniques is a prime concern. Third, the Framingham risk score is a core indicator that can be used to establish an effective prediction model to accurately diagnose CVD. Thus, this study proposes an integrated predictive model to organize five notable classifiers: the rough set (RS), decision tree (DT), random forest (RF), multilayer perceptron (MLP), and support vector machine (SVM), with a novel use of the Framingham risk score for attribute selection (i.e., F-attributes first identified in this study) to determine the key features for identifying CVD. Verification experiments were conducted with three evaluation criteria-accuracy, sensitivity, and specificity-based on 1190 instances of a CVD dataset available from a Taiwan teaching hospital and 2019 examples from a public Framingham dataset. Given the empirical results, the SVM showed the best performance in terms of accuracy (99.67%), sensitivity (99.93%), and specificity (99.71%) in all F-attributes in the CVD dataset compared to the other listed classifiers. The RS showed the highest performance in terms of accuracy (85.11%), sensitivity (86.06%), and specificity (85.19%) in most of the F-attributes in the Framingham dataset. The above study results support novel evidence that no classifier or model is suitable for all practical datasets of medical applications. Thus, identifying an appropriate classifier to address specific medical data is important. Significantly, this study is novel in its calculation and identification of the use of key Framingham risk attributes integrated with the DT technique to produce entropy-based decision rules of knowledge sets, which has not been undertaken in previous research. This study conclusively yielded meaningful entropy-based knowledgeable rules in tree structures and contributed to the differentiation of classifiers from the two datasets with three useful research findings and three helpful management implications for subsequent medical research. In particular, these rules provide reasonable solutions to simplify processes of preventive medicine by standardizing the formats and codes used in medical data to address CVD problems. The specificity of these rules is thus significant compared to those of past research.

5.
Comput Biol Med ; 122: 103824, 2020 07.
Article in English | MEDLINE | ID: mdl-32658729

ABSTRACT

Data in the medical field often contain missing values and may result in biased research results. Therefore, the objective of this work is to propose a new imputation method, a novel weighted distance threshold method, to impute missing values. After several experiments, we find that the proposed imputation method has the following benefits. (1) The proposed method with purity can reassign instances into the nearest class of the dataset, and the purity computation can filter outliers; (2) The proposed method redefines the degree of missing values and can determine attributes and instances relative to the missing values in different datasets; and (3) The proposed method need not set the k value of the nearest neighborhood because this study identifies the k value based on the best threshold to calculate purity to enhance the results of imputation. In addition, the distance threshold can adjust the optimal nearest neighborhood to estimate missing values. This study implements several experiments to compare the proposed method with other imputation methods using different missing types, missing degrees, and types of datasets. The results indicate that the proposed imputation method is better than the listed methods. Moreover, this study uses the stroke dataset from the International Stroke Trial (IST) to verify whether the proposed method can be effectively applied in practice, and the results show that the proposed method achieves 90% accuracy in the Stroke dataset.


Subject(s)
Algorithms , Research Design
6.
PLoS One ; 14(6): e0217591, 2019.
Article in English | MEDLINE | ID: mdl-31166975

ABSTRACT

Owing to the emergence of the Internet and its rapid growth, people can use mobile devices on many social media platforms (blogs, Facebook forums, etc.), and the platforms provide well-known websites for people to express and share their daily activities and ideas on global issues. Many consumers utilize product review websites before making a purchase. Many well-known websites are searched for relevant product reviews and experiences of product use. We can easily collect large amounts of structured and unstructured product data and further analyze the data to determine the desired product information. For this reason, many researchers are gradually focusing on sentiment analysis or opinion exploration (opinion mining) and use this technique to extract and analyze customer opinions and emotions. This paper proposes a sentimental text mining method based on an additional features method to enhance accuracy and reduce implementation time and uses singular value decomposition and principal component analysis for data dimension reduction. This study has four contributions: (1) the proposed algorithm for preprocessing the data for sentiment classification, (2) the additional features to enhance the accuracy of the sentiment classification, (3) the application of singular value decomposition and principal component analysis for data dimension reduction, and (4) the design of five modules based on different features, with or without stemming, to compare the performance results. The experimental results show that the proposed method has better accuracy than other methods and that the proposed method can decrease the implementation time.


Subject(s)
Algorithms , Data Mining , Emotions , Databases as Topic , Humans , Principal Component Analysis
7.
Comput Intell Neurosci ; 2018: 1067350, 2018.
Article in English | MEDLINE | ID: mdl-29765399

ABSTRACT

The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i) the proposed model is different from the previous models lacking the concept of time series; (ii) the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii) the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies.


Subject(s)
Models, Economic , Gene Expression , Models, Biological , Nonlinear Dynamics , Seasons , Time Factors
8.
J Clin Med ; 7(6)2018 May 28.
Article in English | MEDLINE | ID: mdl-29843416

ABSTRACT

Population aging has become a worldwide phenomenon, which causes many serious problems. The medical issues related to degenerative brain disease have gradually become a concern. Magnetic Resonance Imaging is one of the most advanced methods for medical imaging and is especially suitable for brain scans. From the literature, although the automatic segmentation method is less laborious and time-consuming, it is restricted in several specific types of images. In addition, hybrid techniques segmentation improves the shortcomings of the single segmentation method. Therefore, this study proposed a hybrid segmentation combined with rough set classifier and wavelet packet method to identify degenerative brain disease. The proposed method is a three-stage image process method to enhance accuracy of brain disease classification. In the first stage, this study used the proposed hybrid segmentation algorithms to segment the brain ROI (region of interest). In the second stage, wavelet packet was used to conduct the image decomposition and calculate the feature values. In the final stage, the rough set classifier was utilized to identify the degenerative brain disease. In verification and comparison, two experiments were employed to verify the effectiveness of the proposed method and compare with the TV-seg (total variation segmentation) algorithm, Discrete Cosine Transform, and the listing classifiers. Overall, the results indicated that the proposed method outperforms the listing methods.

9.
PLoS One ; 13(12): e0209922, 2018.
Article in English | MEDLINE | ID: mdl-30596772

ABSTRACT

Many different time-series methods have been widely used in forecast stock prices for earning a profit. However, there are still some problems in the previous time series models. To overcome the problems, this paper proposes a hybrid time-series model based on a feature selection method for forecasting the leading industry stock prices. In the proposed model, stepwise regression is first adopted, and multivariate adaptive regression splines and kernel ridge regression are then used to select the key features. Second, this study constructs the forecasting model by a genetic algorithm to optimize the parameters of support vector regression. To evaluate the forecasting performance of the proposed models, this study collects five leading enterprise datasets in different industries from 2003 to 2012. The collected stock prices are employed to verify the proposed model under accuracy. The results show that proposed model is better accuracy than the other listed models, and provide persuasive investment guidance to investors.


Subject(s)
Forecasting , Models, Economic
10.
Comput Intell Neurosci ; 2017: 8734214, 2017.
Article in English | MEDLINE | ID: mdl-29250110

ABSTRACT

Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir's water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir's water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.


Subject(s)
Forecasting , Fresh Water , Machine Learning , Water Resources , Water Supply , Algorithms , Environment Design , Factor Analysis, Statistical , Regression Analysis , Taiwan , Time Factors , Water , Weather
11.
Comput Methods Programs Biomed ; 116(3): 215-25, 2014 Oct.
Article in English | MEDLINE | ID: mdl-24891123

ABSTRACT

This paper presents a method for fast computation of Hessian-based enhancement filters, whose conditions for identifying particular structures in medical images are associated only with the signs of Hessian eigenvalues. The computational costs of Hessian-based enhancement filters come mainly from the computation of Hessian eigenvalues corresponding to image elements to obtain filter responses, because computing eigenvalues of a matrix requires substantial computational effort. High computational cost has become a challenge in the application of Hessian-based enhancement filters. Using a property of the characteristic polynomial coefficients of a matrix and the well-known Routh-Hurwitz criterion in control engineering, it is shown that under certain conditions, the response of a Hessian-based enhancement filter to an image element can be obtained without having to compute Hessian eigenvalues. The computational cost can thus be reduced. Experimental results on several medical images show that the method proposed in this paper can reduce significantly the number of computations of Hessian eigenvalues and the processing times of images. The percentage reductions of the number of computations of Hessian eigenvalues for enhancing blob- and tubular-like structures in two-dimensional images are approximately 90% and 65%, respectively. For enhancing blob-, tubular-, and plane-like structures in three-dimensional images, the reductions are approximately 97%, 75%, and 12%, respectively. For the processing times, the percentage reductions for enhancing blob- and tubular-like structures in two-dimensional images are approximately 31% and 7.5%, respectively. The reductions for enhancing blob-, tubular-, and plane-like structures in three-dimensional images are approximately 68%, 55%, and 3%, respectively.


Subject(s)
Algorithms , Imaging, Three-Dimensional/methods , Lung Neoplasms/diagnostic imaging , Radiographic Image Enhancement/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Signal Processing, Computer-Assisted , Tomography, X-Ray Computed/methods , Humans , Numerical Analysis, Computer-Assisted , Reproducibility of Results , Sensitivity and Specificity
12.
Comput Biol Med ; 42(8): 826-40, 2012 Aug.
Article in English | MEDLINE | ID: mdl-22795228

ABSTRACT

A critical option of total hip arthroplasty (THA) is considered only when tried more conservative treatments but continued to have pain, stiffness, or problems with the function of ones hip. THA plays one of major concerns under the waves of the rapid growth of aging populations and the constrained health care resources in Taiwan. Moreover, prior studies indicated that imbalanced class distribution problems do exist in the constructed classification model and cause seriously negative effects on model performances in the health care industry. Therefore, this study proposes an integrated hybrid approach to provide an alternate method for classifying the quality (e.g., the staying length in hospital) of medical practice with an imbalanced class problem after performing a THA procedure for hip replacement patients and their doctors in the health care industry. The proposed approach is constituted by seven components: expert knowledge, global discretization, imbalanced bootstrap technique, reduct and core methods, rough sets, rule induction, and rule filter. The proposed approach is illustrated in practice by examining an experimental dataset from the National Health Insurance Research Database (NHIRD) in Taiwan. The experimental results reveal that the proposed approach has better performance than the listed methods under evaluation criteria. The output created by the rough set LEM2 algorithm is a comprehensible decision rule set that can be applied in knowledge-based health care services as desired. The analytical results provide useful THA information for both academics and practitioners and these results could be applicable to other diseases or to other countries with similar social and cultural practices.


Subject(s)
Arthroplasty, Replacement, Hip/standards , Medical Informatics Computing/standards , Quality of Health Care , Algorithms , Area Under Curve , Databases, Factual , Humans
13.
Arch Gerontol Geriatr ; 55(2): 323-30, 2012.
Article in English | MEDLINE | ID: mdl-21944320

ABSTRACT

As the incidence of THA is expected to rise with an aging population and improvements in surgery, a satisfactory outcome in health care can effectively increase medical quality. This paper uses a serious data screening function by THA physician to reduce data dimension after data collected from the NHI database, then 8576 cases are obtained from the original cases of 10,388 after screening procedure. The proposed model adopts an imbalanced sampling method to solve class imbalance problem, and utilizes rough set to locate core attributes. Based on the core attributes, the extracted rules can be comprehensive for the rules of medical quality. In verification, THA dataset is taken as case study; the performance of the proposed model is verified and compared with other data-mining methods under some criteria. And the generated decision rules and core attributes could find more managerial implication. Moreover, the result can provide stakeholders with useful THA information to help to make decision.


Subject(s)
Arthroplasty, Replacement, Hip , Quality Assurance, Health Care/methods , Algorithms , Data Mining/methods , Female , Humans , Male , Models, Biological , Sampling Studies
14.
Comput Biol Med ; 42(2): 213-21, 2012 Feb.
Article in English | MEDLINE | ID: mdl-22177941

ABSTRACT

Identifying patients in a Target Customer Segment (TCS) is important to determine the demand for, and to appropriately allocate resources for, health care services. The purpose of this study is to propose a two-stage clustering-classification model through (1) initially integrating the RFM attribute and K-means algorithm for clustering the TCS patients and (2) then integrating the global discretization method and the rough set theory for classifying hospitalized departments and optimizing health care services. To assess the performance of the proposed model, a dataset was used from a representative hospital (termed Hospital-A) that was extracted from a database from an empirical study in Taiwan comprised of 183,947 samples that were characterized by 44 attributes during 2008. The proposed model was compared with three techniques, Decision Tree, Naive Bayes, and Multilayer Perceptron, and the empirical results showed significant promise of its accuracy. The generated knowledge-based rules provide useful information to maximize resource utilization and support the development of a strategy for decision-making in hospitals. From the findings, 75 patients in the TCS, three hospital departments, and specific diagnostic items were discovered in the data for Hospital-A. A potential determinant for gender differences was found, and the age attribute was not significant to the hospital departments.


Subject(s)
Algorithms , Cluster Analysis , Delivery of Health Care , Hospitals/statistics & numerical data , Patients/statistics & numerical data , Bayes Theorem , Computational Biology , Decision Trees , Humans
15.
Arch Gerontol Geriatr ; 54(1): 232-7, 2012.
Article in English | MEDLINE | ID: mdl-21382641

ABSTRACT

This study collected the real HD-data from area scale hospital database with 72 attributes and 18,113 records. The study proposes a novel procedure to assess the patient's HD-quality, including five facets: (1) Delete the unrelated attributes and missing values. (2) Employ expert granularity to cut decision-attributed Kt/V (where K is the dialyzer clearance coefficient of urea nitrogen, t is the time for dialysis and V is the urea nitrogen volume of distribution in the body). (3) Use information-gain to select features, to reduce the total number of attributes to 17. (4) Utilize multiple regression to test the degree of co-linearity and select features, the dimension of dataset is reduced to 8 attributes and 2737 records. (5) Finally, the rules of HD-quality and accuracy performance are generated by granular rough set theory. In performance comparison, the decision tree (DT-C4.5), the Naïve Bayes (NB) probabilistic model and Artificial Neural Networks-Multilayer Perceptrons (ANN-MLP) are employed to compare with the proposed procedure in accuracy. The results can assist doctors to reduce the time of diagnosis and to achieve dose of fitness-based dialysis for the patients.


Subject(s)
Kidney Failure, Chronic/therapy , Models, Theoretical , Renal Dialysis/standards , Adult , Aged , Aged, 80 and over , Databases, Factual , Female , Humans , Male , Middle Aged
16.
Arch Gerontol Geriatr ; 55(1): 157-64, 2012.
Article in English | MEDLINE | ID: mdl-21813192

ABSTRACT

TKA is a highly effective means of treating (advanced knee arthritis) degenerative joint disease. Previous studies have demonstrated that a high surgical volume for total joint arthroplasty reduces morbidity and improved economic outcome, these methods for themselves are fraught with complexity, uncertainty and non-linear problem in terms of medical datasets may be unable to more accurately finding important information. As medical datasets often include a large number of features (attributes), some of which are irrelevant, and therefore it cannot intuitively understand the corresponding to main factors which affecting the resource utilizations of healthcare. In order to solve the problems mentioned above, this study employs specialist advice to filter relevant cases (records) and proposed an integrated five features selection methods to select the important features. Based on rough set theory (RST), the rules are extracted and compared with other methods in terms of accuracy. The contributions contain: (1) data screening based on specialist opinions, (2) two stage feature selection by analysis of variance (ANOVA) and proposed an integrated feature selection approach (IFSA), and (3) data discretization and rule generation by RST. The proposed model is verified by using three datasets for comparison accuracy. The results can provide a valuable reference for National Health Insurance Bureau (NHI) in establishing the TKA standard.


Subject(s)
Arthroplasty, Replacement, Knee/statistics & numerical data , Arthroplasty, Replacement, Knee/standards , Osteoarthritis, Knee/surgery , Algorithms , Female , Humans , Male , Medical Records Systems, Computerized/statistics & numerical data , Models, Biological
17.
Arch Gerontol Geriatr ; 53(1): e5-9, 2011.
Article in English | MEDLINE | ID: mdl-20570374

ABSTRACT

The purpose of this study is to discover valuable medical facts by utilizing the Taiwan National Health Insurance (NHI) database, which contains 32,200 records of TKA surgeries. Three main objectives of this paper include the following: (a) building learning curves of TKA from the target database; (b) characterizing how the TKA volume correlates with infection rate and mortality; (c) examining the differences of infection rate and mortality between the medical center (Group I) and the non-medical center (Group II). The TKA samples are classified into two groups according to their institution type (medical center and non-medical center). The Z-test is used to test whether there are differences in the infection rate and mortality between the two observed groups. This study also adopts linear/nonlinear regression to investigate the relationship between TKA volume and the infection rate (mortality). This study has three main findings: (a) it confirms a correlation between the TKA surgical volumes and certain outcomes, (b) surgeons and hospitals with higher TKA volumes exhibit better operation quality, lower postoperative complication rate, and (c) there are significant differences in infection and mortality rate between Group I and Group II.


Subject(s)
Arthroplasty, Replacement, Knee/education , Learning Curve , Aged , Arthroplasty, Replacement, Knee/mortality , Arthroplasty, Replacement, Knee/statistics & numerical data , Female , Humans , Male , Postoperative Complications/epidemiology , Postoperative Complications/etiology , Postoperative Complications/microbiology , Taiwan/epidemiology , Treatment Outcome
SELECTION OF CITATIONS
SEARCH DETAIL
...