Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
Heliyon ; 10(7): e29181, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-38601658

ABSTRACT

This study facilitates university student profiling by constructing a prediction model to forecast the classification of future students participating in a survey, thereby enhancing the utility and effectiveness of the questionnaire approach. In the context of the ongoing digital transformation of campuses, higher education institutions are increasingly prioritizing student educational development. This shift aligns with the maturation of big data technology, prompting scholars to focus on profiling university student education. While earlier research in this area, particularly foreign studies, focus on extracting data from specific learning contexts and often relied on single data sources, our study addresses these limitations. We employ a comprehensive approach, incorporating questionnaire surveys to capture a diverse array of student data. Considering various university student attributes, we create a holistic profile of the student population. Furthermore, we use clustering techniques to develop a categorical prediction model. In our clustering analysis, we employ the K-means algorithm to group student survey data. The results reveal four distinct student profiles: Diligent Learners, Earnest Individuals, Discerning Achievers, and Moral Advocates. These profiles are subsequently used to label student groups. For the classification task, we leverage these labels to establish a prediction model based on the Back Propagation neural network, with the goal of assigning students to their respective groups. Through meticulous model optimization, an impressive classification accuracy of 90.22% is achieved. Our research offers a novel perspective and serves as a valuable methodological reference for university student profiling.

2.
Diagnostics (Basel) ; 14(4)2024 Feb 07.
Article in English | MEDLINE | ID: mdl-38396404

ABSTRACT

Alzheimer's disease (AD) and vascular dementia (VaD) are the two most common forms of dementia. However, their neuropsychological and pathological features often overlap, making it difficult to distinguish between AD and VaD. In addition to clinical consultation and laboratory examinations, clinical dementia diagnosis in Taiwan will also include Tc-99m-ECD SPECT imaging examination. Through machine learning and deep learning technology, we explored the feasibility of using the above clinical practice data to distinguish AD and VaD. We used the physiological data (33 features) and Tc-99m-ECD SPECT images of 112 AD patients and 85 VaD patients in the Taiwanese Nuclear Medicine Brain Image Database to train the classification model. The results, after filtering by the number of SVM RFE 5-fold features, show that the average accuracy of physiological data in distinguishing AD/VaD is 81.22% and the AUC is 0.836; the average accuracy of training images using the Inception V3 model is 85% and the AUC is 0.95. Finally, Grad-CAM heatmap was used to visualize the areas of concern of the model and compared with the SPM analysis method to further understand the differences. This research method can quickly use machine learning and deep learning models to automatically extract image features based on a small amount of general clinical data to objectively distinguish AD and VaD.

3.
PeerJ Comput Sci ; 9: e1280, 2023.
Article in English | MEDLINE | ID: mdl-37346612

ABSTRACT

Spinal diseases are killers that cause long-term disturbance to people with complex and diverse symptoms and may cause other conditions. At present, the diagnosis and treatment of the main diseases mainly depend on the professional level and clinical experience of doctors, which is a breakthrough problem in the field of medicine. This article proposes the SMOTE-RFE-XGBoost model, which takes the physical angle of human bone as the research index for feature selection and classification model construction to predict spinal diseases. The research process is as follows: two groups of people with normal and abnormal spine conditions are taken as the research objects of this article, and the synthetic minority oversampling technique (SMOTE) algorithm is used to address category imbalance. Three methods, least absolute shrinkage and selection operator (LASSO), tree-based feature selection, and recursive feature elimination (RFE), are used for feature selection. Logistic regression (LR), support vector machine (SVM), parsimonious Bayes, decision tree (DT), random forest (RF), gradient boosting tree (GBT), extreme gradient boosting (XGBoost), and ridge regression models are used to classify the samples, construct single classification models and combine classification models and rank the feature importance. According to the accuracy and mean square error (MSE) values, the SMOTE-RFE-XGBoost combined model has the best classification, with accuracy, MSE and F1 values of 97.56%, 0.1111 and 0.8696, respectively. The importance of four indicators, lumbar slippage, cervical tilt, pelvic radius and pelvic tilt, was higher.

4.
Ecotoxicol Environ Saf ; 255: 114806, 2023 Apr 15.
Article in English | MEDLINE | ID: mdl-36948010

ABSTRACT

Cancer, the second largest human disease, has become a major public health problem. The prediction of chemicals' carcinogenicity before their synthesis is crucial. In this paper, seven machine learning algorithms (i.e., Random Forest (RF), Logistic Regression (LR), Support Vector Machines (SVM), Complement Naive Bayes (CNB), K-Nearest Neighbor (KNN), XGBoost, and Multilayer Perceptron (MLP)) were used to construct the carcinogenicity triple classification prediction (TCP) model (i.e., 1A, 1B, Category 2). A total of 1444 descriptors of 118 hazardous organic chemicals were calculated by Discovery Studio 2020, Sybyl X-2.0 and PaDEL-Descriptor software. The constructed carcinogenicity TCP model was evaluated through five model evaluation indicators (i.e., Accuracy, Precision, Recall, F1 Score and AUC). The model evaluation results show that Accuracy, Precision, Recall, F1 Score and AUC evaluation indicators meet requirements (greater than 0.6). The accuracy of RF, LR, XGBoost, and MLP models for predicting carcinogenicity of Category 2 is 91.67%, 79.17%, 100%, and 100%, respectively. In addition, the constructed machine learning model in this study has potential for error correction. Taking XGBoost model as an example, the predicted carcinogenicity level of 1,2,3-Trichloropropane (96-18-4) is Category 2, but the actual carcinogenicity level is 1B. But the difference between Category 2 and 1B is only 0.004, indicating that the XGBoost is one optimum model of the seven constructed machine learning models. Besides, results showed that functional groups like chlorine and benzene ring might influence the prediction of carcinogenic classification. Therefore, considering functional group characteristics of chemicals before constructing the carcinogenicity prediction model of organic chemicals is recommended. The predicted carcinogenicity of the organic chemicals using the optimum machine leaning model (i.e., XGBoost) was also evaluated and verified by the toxicokinetics. The RF and XGBoost TCP models constructed in this paper can be used for carcinogenicity detection before synthesizing new organic substances. It also provides technical support for the subsequent management of organic chemicals.


Subject(s)
Carcinogens , Hazardous Substances , Machine Learning , Organic Chemicals , Bayes Theorem , Carcinogenesis , Carcinogens/toxicity , Carcinogens/chemistry , Hazardous Substances/chemistry , Hazardous Substances/toxicity , Organic Chemicals/toxicity , Organic Chemicals/chemistry , Support Vector Machine , World Health Organization , Algorithms , United States , European Union , China , Databases, Factual
5.
Article in Chinese | WPRIM (Western Pacific) | ID: wpr-1015620

ABSTRACT

DNA double-strand break(DSB) is a serious form of DNA damage in cells, which is closely related to a variety of genomic instability diseases, including cancer, abnormal recombination and neuronal development. Due to the limitations of cost and technical threshold, high-resolution DSB mapping by high-throughput sequencing technology is very limited. This hinders our understanding of the DSB situation in the genomes of different species. Therefore, we developed a classification prediction model based on random Forest(RF), support vector machine(SVM) and logistic regression(LR) classifiers to predict DSB loci in the whole genome of human NHEK cells. In addition to the epigenetic features and DNA shape features commonly used in previous prediction studies, we found that DNA sequence features(kmer frequency, GC content, GC-skew, Mutual Information) can also characterize DSB sites. At the same time, the prediction accuracy is improved after considering DNA physical properties, chemical shifts and autocorrelation information. After combining all the above features, logistic regression(LR) has the best prediction performance(AUC = 0. 97), which is comparable to previous prediction(AUC = 0. 964). In addition, the optimal feature collection consisting of 294 features was obtained by the incremental feature search method, and the corresponding AUC value reached 0. 974.

6.
Article in English | MEDLINE | ID: mdl-36360793

ABSTRACT

Soundscape is the production of sounds and the acoustic environment, and it emphasizes peoples' perceiving and experiencing process in the context. To this end, this paper focuses on the Pearl River Delta in China, and implements an empirical study based on the soundscape evaluation data from the Participatory Soundscape Sensing (PSS) system, and the geospatial data from multiple sources. The optimal variable set with 24 features are successfully used to establish a random forest model to predict the soundscape comfort of a new site (F1 = 0.61). Results show that the acoustic factors are most important to successfully classify soundscape comfort (averaged relative importance of 17.45), subsequently ranking by built environment elements (11.28), temporal factors (9.59), and demographic factors (9.14), while landscape index (8.60) and land cover type (7.71) seem to have unclear importance. Furthermore, the partial dependence analysis provides the answers about the appropriate threshold or category of various variables to quantitatively or qualitatively specify the necessary management and control metrics for maintaining soundscape quality. These findings suggest that mainstreaming the soundscape in the coupled natural-human systems and clarifying the mechanisms between soundscape perception and geospatial factors can be beneficial to create a high-quality soundscape in human habitats.


Subject(s)
Acoustics , Sound , Humans , China , Ecosystem
7.
J Chromatogr A ; 1637: 461733, 2021 Jan 25.
Article in English | MEDLINE | ID: mdl-33385745

ABSTRACT

A hydrophilic interaction (HILIC) ultra-high performance liquid chromatography (UHPLC) with triple quadrupole tandem mass spectrometry (MS/MS) method was developed and validated for the quantification of 21 free amino acids (AAs). Compared to published reports, our method renders collectively improved sensitivity with lower limit of quantification (LLOQ) at 0.5~42.19 ng/mL with 0.3 µL injection volume (or equivalently 0.15~12.6 pg injected on column), robust linear range from LLOQ up to 3521~5720 ng/mL (or 1056 ~ 1716 pg on column) and a high throughput with total time of 6 min per sample, as well as easier experimental setup, less maintenance and higher adaptation flexibility. Ammonium formate in the mobile phase, though commonly used in HILIC, was found unnecessary in our experimental setup, and its removal from mobile phase was key for significant improvement in sensitivity (4~74 times higher than with 5 mM ammonium formate). Addition of 10 (or up to100 mM) hydrochloric acid (HCl) in the sample diluent was crucial to keep response linearity for basic amino acids of histidine, lysine and arginine. Different HCl concentration (10~100 mM) in sample diluent also excreted an effect on detection sensitivity, and it is of importance to keep the final prepared sample and calibrators in the same HCl level. Leucine and isoleucine were distinguished using different transitions. Validated at seven concentration levels, accuracy was bound within 75~125%, matrix effect generally within 90~110%, and precision error mostly below 2.5%. Using this newly developed method, the free amino acids were then quantified in a total of 544 African indigenous vegetables (AIVs) samples from African nightshades (AN), Ethiopian mustards (EM), amaranths (AM) and spider plants (SP), comprising a total of 8 identified species and 43 accessions, cultivated and harvested in USA, Kenya and Tanzania over several years, 2013~2018. The AN, EM, AM and SP were distinguished based on free AAs profile using machine learning methods (ML) including principle component analysis, discriminant analysis, naïve Bayes, elastic net-regularized logistic regression, random forest and support vector machine, with prediction accuracy achieved at ca. 83~97% on the test set (train/test ratio at 7/3). An interactive ML platform was constructed using R Shiny at https://boyuan.shinyapps.io/AIV_Classifier/ for modeling train-test simulation and category prediction of unknown AIV sample(s). This new method presents a robust and rapid approach to quantifying free amino acids in plants for use in evaluating plants, biofortification, botanical authentication, safety, adulteration and with applications to nutrition, health and food product development.


Subject(s)
Amino Acids/analysis , Chromatography, High Pressure Liquid/methods , Machine Learning , Tandem Mass Spectrometry/methods , Vegetables/chemistry , Bayes Theorem , Humans , Hydrophobic and Hydrophilic Interactions , Principal Component Analysis , Reproducibility of Results
8.
Oncol Lett ; 20(6): 387, 2020 Dec.
Article in English | MEDLINE | ID: mdl-33193847

ABSTRACT

Esophageal squamous cell carcinoma (ESCC) is one of the deadliest cancer types with a poor prognosis due to the lack of symptoms in the early stages and a delayed diagnosis. The present study aimed to identify the risk factors significantly associated with prognosis and to search for novel effective diagnostic modalities for patients with early-stage ESCC. mRNA and methylation data of patients with ESCC and the corresponding clinical information were downloaded from The Cancer Genome Atlas (TCGA) database, and the representation features were screened using deep learning autoencoder. The univariate Cox regression model was used to select the prognosis-related features from the representation features. K-means clustering was used to cluster the TCGA samples. Support vector machine classifier was constructed based on the top 75 features mostly associated with the risk subgroups obtained from K-means clustering. Two ArrayExpress datasets were used to verify the reliability of the obtained risk subgroups. The differentially expressed genes and methylation genes (DEGs and DMGs) between the risk subgroups were analyzed, and pathway enrichment analysis was performed. A total of 500 representation features were produced. Using K-means clustering, the TCGA samples were clustered into two risk subgroups with significantly different overall survival rates. Joint multimodal representation strategy, which showed a good model fitness (C-index=0.760), outperformed early-fusion autoencoder strategy. The joint representation learning-based classification model had good robustness. A total of 1,107 DEGs and 199 DMGs were screened out between the two risk subgroups. The DEGs were involved in 70 pathways, the majority of which were correlated with metastasis and proliferation of various cancer types, including cytokine-cytokine receptor interaction, cell adhesion molecules PPAR signaling pathway, pathways in cancer, transcriptional misregulation in cancer and ECM-receptor interaction pathways. The two survival subgroups obtained via the joint representation learning-based model had good robustness, and had prognostic significance for patients with ESCC.

9.
Article in Chinese | WPRIM (Western Pacific) | ID: wpr-779513

ABSTRACT

Objective To evaluate the efficiency of Logistic regression algorithm and random forest algorithm in prediction of blood glucose control in patients with type 2 diabetes mellitus (T2DM) after 3 months, and explore the influencing factors of blood glucose control. Methods The data was extracted from baseline survey and follow-up information of patients with T2DM in Shunyi and Tongzhou Districts. The patient’s 3-month glycosylated hemoglobin which was more than 6.5% was chosen as the outcome categorical variable. The random forest algorithm and Logistic algorithm were used to establish the prediction model. The predictive efficiency was evaluated with the area under receive operating characteristic curve (AUC) and accuracy rate. Results Factors affecting the patient’s glycemic control included baseline fasting plasma glucose(P<0.001), duration of disease(P<0.001), smoking(P=0.026), static activity time(P=0.006), body mass index(overweight P=0.002, obesity P=0.011), bracelet use(P=0.028), and diabetes diet(P=0.002).The Logistic regression prediction model had an AUC of 0.738, a sensitivity of 72.9%, a specificity of 68.1%, and an accuracy of 71.2%. The random forest model had an AUC of 0.756, a sensitivity of 74.5%, a specificity of 69.5%, and an accuracy of 72.8%. Conclusions The efficiency of random forest is better than Logistic regression model, which can be applied to the prediction of blood glucose control and assist the management of diabetic patients.

10.
Oncotarget ; 8(35): 58809-58822, 2017 Aug 29.
Article in English | MEDLINE | ID: mdl-28938599

ABSTRACT

Breast cancer is highly heterogeneous and is classified into four subtypes characterized by specific biological traits, treatment responses, and clinical prognoses. We performed a systemic analysis of 698 breast cancer patient samples from The Cancer Genome Atlas project database. We identified 136 breast cancer genes differentially expressed among the four subtypes. Based on unsupervised clustering analysis, these 136 core genes efficiently categorized breast cancer patients into the appropriate subtypes. Functional enrichment based on Kyoto Encyclopedia of Genes and Genomes analysis identified six functional pathways regulated by these genes: JAK-STAT signaling, basal cell carcinoma, inflammatory mediator regulation of TRP channels, non-small cell lung cancer, glutamatergic synapse, and amyotrophic lateral sclerosis. Three support vector machine (SVM) classification models based on the identified pathways effectively classified different breast cancer subtypes, suggesting that breast cancer subtype-specific risk assessment based on disease pathways could be a potentially valuable approach. Our analysis not only provides insight into breast cancer subtype-specific mechanisms, but also may improve the accuracy of SVM classification models.

SELECTION OF CITATIONS
SEARCH DETAIL
...