RESUMO
Load recognition remains not comprehensively explored in Home Energy Management Systems (HEMSs). There are gaps in current approaches to load recognition, such as enhancing appliance identification and increasing the overall performance of the load-recognition system through more robust models. To address this issue, we propose a novel approach based on the Analysis of Variance (ANOVA) F-test combined with SelectKBest and gradient-boosting machines (GBMs) for load recognition. The proposed approach improves the feature selection and consequently aids inter-class separability. Further, we optimized GBM models, such as the histogram-based gradient-boosting machine (HistGBM), light gradient-boosting machine (LightGBM), and XGBoost (extreme gradient boosting), to create a more reliable load-recognition system. Our findings reveal that the ANOVA-GBM approach achieves greater efficiency in training time, even when compared to Principal Component Analysis (PCA) and a higher number of features. ANOVA-XGBoost is approximately 4.31 times faster than PCA-XGBoost, ANOVA-LightGBM is about 5.15 times faster than PCA-LightGBM, and ANOVA-HistGBM is 2.27 times faster than PCA-HistGBM. The general performance results expose the impact on the overall performance of the load-recognition system. Some of the key results show that the ANOVA-LightGBM pair reached 96.42% accuracy, 96.27% F1, and a Kappa index of 0.9404; the ANOVA-HistGBM combination achieved 96.64% accuracy, 96.48% F1, and a Kappa index of 0.9434; and the ANOVA-XGBoost pair attained 96.75% accuracy, 96.64% F1, and a Kappa index of 0.9452; such findings overcome rival methods from the literature. In addition, the accuracy gain of the proposed approach is prominent when compared straight to its competitors. The higher accuracy gains were 13.09, 13.31, and 13.42 percentage points (pp) for the pairs ANOVA-LightGBM, ANOVA-HistGBM, and ANOVA-XGBoost, respectively. These significant improvements highlight the effectiveness and refinement of the proposed approach.
RESUMO
Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBVM), milk fat content (EBVF) and milk protein content (EBVP) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained:â¢Predicted breeding values for animals not included in the dataset.â¢Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits.