RESUMO
With the escalating impacts of drought events driven by climate change, reducing the uncertainty of drought projections becomes critical for enhancing risk management and adaptation strategies. This study aimed to develop an index for assessing the performance of CMIP6 Global Climate Models in simulating meteorological drought scenarios across regional hydrological systems, intended to provide more reliable information for management purposes. Named the 'Drought Representation Index for CMIP Climate Model Performance' (DRIP), this index evaluates CMIP models' performance to represent drought severity, duration, and return period. DRIP was used to select CMIP models and create an ensemble of the best-performing models (E-DRIP) to improve the reliability of drought projections. E-DRIP was then compared with a general ensemble of available CMIP6 models (E-CMIP). We applied this method in Southeast Brazil, a region known for its climate uncertainties and low predictability; specifically, it was implemented within the Paraíba do Sul River Basin, a nationally strategic watershed in a highly populated and industrialized area, which has recently faced unprecedented drought-related water crises. Results showed that DRIP effectively assessed the individual performance of CMIP models, which exhibited considerable variability, and identified the top-performing models for a multi-model ensemble. Additionally, the E-DRIP ensemble significantly reduced uncertainties in drought projections, achieving an average reduction of 63 % in the study area compared to E-CMIP. Furthermore, the proposed method enables evaluations across any standardized drought index scale, reference period, or threshold, and can be readily adapted to other hydrological systems.
RESUMO
Pseudomonas aeruginosa (P. aeruginosa) poses a significant threat as a nosocomial pathogen due to its robust resistance mechanisms and virulence factors. This study integrates subtractive proteomics and ensemble docking to identify and characterize essential proteins in P. aeruginosa, aiming to discover therapeutic targets and repurpose commercial existing drugs. Using subtractive proteomics, we refined the dataset to discard redundant proteins and minimize potential cross-interactions with human proteins and the microbiome proteins. We identified 12 key proteins, including a histidine kinase and members of the RND efflux pump family, known for their roles in antibiotic resistance, virulence, and antigenicity. Predictive modeling of the three-dimensional structures of these RND proteins and subsequent molecular ensemble-docking simulations led to the identification of MK-3207, R-428, and Suramin as promising inhibitor candidates. These compounds demonstrated high binding affinities and effective inhibition across multiple metrics. Further refinement using non-covalent interaction index methods provided deeper insights into the electronic effects in protein-ligand interactions, with Suramin exhibiting superior binding energies, suggesting its broad-spectrum inhibitory potential. Our findings confirm the critical role of RND efflux pumps in antibiotic resistance and suggest that MK-3207, R-428, and Suramin could be effectively repurposed to target these proteins. This approach highlights the potential of drug repurposing as a viable strategy to combat P. aeruginosa infections.
Assuntos
Antibacterianos , Proteínas de Bactérias , Reposicionamento de Medicamentos , Simulação de Acoplamento Molecular , Proteoma , Proteômica , Pseudomonas aeruginosa , Pseudomonas aeruginosa/efeitos dos fármacos , Pseudomonas aeruginosa/metabolismo , Proteínas de Bactérias/metabolismo , Proteínas de Bactérias/química , Proteínas de Bactérias/antagonistas & inibidores , Proteômica/métodos , Proteoma/metabolismo , Antibacterianos/farmacologia , Antibacterianos/química , Suramina/farmacologia , Suramina/química , HumanosRESUMO
Coffee Breeding programs have traditionally relied on observing plant characteristics over years, a slow and costly process. Genomic selection (GS) offers a DNA-based alternative for faster selection of superior cultivars. Stacking Ensemble Learning (SEL) combines multiple models for potentially even more accurate selection. This study explores SEL potential in coffee breeding, aiming to improve prediction accuracy for important traits [yield (YL), total number of the fruits (NF), leaf miner infestation (LM), and cercosporiosis incidence (Cer)] in Coffea Arabica. We analyzed data from 195 individuals genotyped for 21,211 single-nucleotide polymorphism (SNP) markers. To comprehensively assess model performance, we employed a cross-validation (CV) scheme. Genomic Best Linear Unbiased Prediction (GBLUP), multivariate adaptive regression splines (MARS), Quantile Random Forest (QRF), and Random Forest (RF) served as base learners. For the meta-learner within the SEL framework, various options were explored, including Ridge Regression, RF, GBLUP, and Single Average. The SEL method was able to predict the predictive ability (PA) of important traits in Coffea Arabica. SEL presented higher PA compared with those obtained for all base learner methods. The gains in PA in relation to GBLUP were 87.44% (the ratio between the PA obtained from best Stacking model and the GBLUP), 37.83%, 199.82%, and 14.59% for YL, NF, LM and Cer, respectively. Overall, SEL presents a promising approach for GS. By combining predictions from multiple models, SEL can potentially enhance the PA of GS for complex traits.
RESUMO
Through enviromics, precision breeding leverages innovative geotechnologies to customize crop varieties to specific environments, potentially improving both crop yield and genetic selection gains. In Brazil's four southernmost states, data from 183 distinct geographic field trials (also accounting for 2017-2021) covered information on 164 genotypes: 79 phenotyped maize hybrid genotypes for grain yield and their 85 nonphenotyped parents. Additionally, 1342 envirotypic covariates from weather, soil, sensor-based, and satellite sources were collected to engineer 10 K synthetic enviromic markers via machine learning. Soil, radiation light, and surface temperature variations remarkably affect differential genotype yield, hinting at ecophysiological adjustments including evapotranspiration and photosynthesis. The enviromic ensemble-based random regression model showcases superior predictive performance and efficiency compared to the baseline and kernel models, matching the best genotypes to specific geographic coordinates. Clustering analysis has identified regions that minimize genotype-environment (G × E) interactions. These findings underscore the potential of enviromics in crafting specific parental combinations to breed new, higher-yielding hybrid crops. The adequate use of envirotypic information can enhance the precision and efficiency of maize breeding by providing important inputs about the environmental factors that affect the average crop performance. Generating enviromic markers associated with grain yield can enable a better selection of hybrids for specific environments.
RESUMO
Obstructive sleep apnea/hypopnea syndrome (OSAHS) is a condition linked to severe cardiovascular and neuropsychological consequences, characterized by recurrent episodes of partial or complete upper airway obstruction during sleep, leading to compromised ventilation, hypoxemia, and micro-arousals. Polysomnography (PSG) serves as the gold standard for confirming OSAHS, yet its extended duration, high cost, and limited availability pose significant challenges. In this paper, we employ a range of machine learning techniques, including Neural Networks, Decision Trees, Random Forests, and Extra Trees, for OSAHS diagnosis. This approach aims to achieve a diagnostic process that is not only more accessible but also more efficient. The dataset utilized in this study consists of records from 601 adults assessed between 2014 and 2016 at a specialized sleep medical center in Colombia. This research underscores the efficacy of ensemble methods, specifically Random Forests and Extra Trees, achieving an area under the Receiver Operating Characteristic (ROC) curve of 89.2% and 89.6%, respectively. Additionally, a web application has been devised, integrating the optimal model, empowering qualified medical practitioners to make informed decisions through patient registration, an input of 18 variables, and the utilization of the Random Forests model for OSAHS screening.
RESUMO
One of the significant challenges in scaling agile software development is organizing software development teams to ensure effective communication among members while equipping them with the capabilities to deliver business value independently. A formal approach to address this challenge involves modeling it as an optimization problem: given a professional staff, how can they be organized to optimize the number of communication channels, considering both intra-team and inter-team channels? In this article, we propose applying a set of bio-inspired algorithms to solve this problem. We introduce an enhancement that incorporates ensemble learning into the resolution process to achieve nearly optimal results. Ensemble learning integrates multiple machine-learning strategies with diverse characteristics to boost optimizer performance. Furthermore, the studied metaheuristics offer an excellent opportunity to explore their linear convergence, contingent on the exploration and exploitation phases. The results produce more precise definitions for team sizes, aligning with industry standards. Our approach demonstrates superior performance compared to the traditional versions of these algorithms.
RESUMO
-The automatic identification of human physical activities, commonly referred to as Human Activity Recognition (HAR), has garnered significant interest and application across various sectors, including entertainment, sports, and notably health. Within the realm of health, a myriad of applications exists, contingent upon the nature of experimentation, the activities under scrutiny, and the methodology employed for data and information acquisition. This diversity opens doors to multifaceted applications, including support for the well-being and safeguarding of elderly individuals afflicted with neurodegenerative diseases, especially in the context of smart homes. Within the existing literature, a multitude of datasets from both indoor and outdoor environments have surfaced, significantly contributing to the activity identification processes. One prominent dataset, the CASAS project developed by Washington State University (WSU) University, encompasses experiments conducted in indoor settings. This dataset facilitates the identification of a range of activities, such as cleaning, cooking, eating, washing hands, and even making phone calls. This article introduces a model founded on the principles of Semi-supervised Ensemble Learning, enabling the harnessing of the potential inherent in distance-based clustering analysis. This technique aids in the identification of distinct clusters, each encapsulating unique activity characteristics. These clusters serve as pivotal inputs for the subsequent classification process, which leverages supervised techniques. The outcomes of this approach exhibit great promise, as evidenced by the quality metrics' analysis, showcasing favorable results compared to the existing state-of-the-art methods. This integrated framework not only contributes to the field of HAR but also holds immense potential for enhancing the capabilities of smart homes and related applications.
RESUMO
Molecular features play an important role in different bio-chem-informatics tasks, such as the Quantitative Structure-Activity Relationships (QSAR) modeling. Several pre-trained models have been recently created to be used in downstream tasks, either by fine-tuning a specific model or by extracting features to feed traditional classifiers. In this regard, a new family of Evolutionary Scale Modeling models (termed as ESM-2 models) was recently introduced, demonstrating outstanding results in protein structure prediction benchmarks. Herein, we studied the usefulness of the different-dimensional embeddings derived from the ESM-2 models to classify antimicrobial peptides (AMPs). To this end, we built a KNIME workflow to use the same modeling methodology across experiments in order to guarantee fair analyses. As a result, the 640- and 1280-dimensional embeddings derived from the 30- and 33-layer ESM-2 models, respectively, are the most valuable since statistically better performances were achieved by the QSAR models built from them. We also fused features of the different ESM-2 models, and it was concluded that the fusion contributes to getting better QSAR models than using features of a single ESM-2 model. Frequency studies revealed that only a portion of the ESM-2 embeddings is valuable for modeling tasks since between 43% and 66% of the features were never used. Comparisons regarding state-of-the-art deep learning (DL) models confirm that when performing methodologically principled studies in the prediction of AMPs, non-DL based QSAR models yield comparable-to-superior performances to DL-based QSAR models. The developed KNIME workflow is available-freely at https://github.com/cicese-biocom/classification-QSAR-bioKom. This workflow can be valuable to avoid unfair comparisons regarding new computational methods, as well as to propose new non-DL based QSAR models.
Assuntos
Peptídeos Antimicrobianos , Fluxo de TrabalhoRESUMO
Molecular dynamics simulations have proved extremely useful in investigating the functioning of proteins with atomic-scale resolution. Many applications to the study of RNA also exist, and their number increases by the day. However, implementing MD simulations for RNA molecules in solution faces challenges that the MD practitioner must be aware of for the appropriate use of this tool. In this chapter, we present the fundamentals of MD simulations, in general, and the peculiarities of RNA simulations, in particular. We discuss the strengths and limitations of the technique and provide examples of its application to elucidate small RNA's performance.
Assuntos
Simulação de Dinâmica Molecular , Proteínas , RNA Mensageiro , Proteínas/metabolismo , RNA/genética , Conformação ProteicaRESUMO
Heatwaves are a global issue that threaten microbial populations and deteriorate ecosystems. However, how river microbial communities respond to heatwaves and whether and how high temperatures exceed microbial adaptation remain unclear. In this study, we proposed four types of pulse temperature-induced microbial responses and predicted the possibility of microbial adaptation to high temperature in global rivers using ensemble machine learning models. Our findings suggest that microbial communities in parts of South American (e.g., Brazil and Chile) and Southeast Asian (e.g., Vietnam) countries are likely to change due to heatwave disturbance from 25 to 37°C for consecutive days. Furthermore, the microbial communities in approximately 48.4% of the global river gauge stations are prone to fast stress inadaptation, with approximately 76.9% of these stations expected to exceed microbial adaptation after heatwave disturbances. If emissions of particulate matter with sizes not more than 2.5 µm (PM2.5, an indicator of human activities) increase by twofold, the number of global rivers associated with the fast stress adaptation type will decrease by ~13.7% after heatwave disturbances. Understanding microbial responses is crucially important for effective ecosystem management, especially for fragile and sensitive rivers facing heatwave events.
Assuntos
Ecossistema , Rios , Humanos , Temperatura , Brasil , ChileRESUMO
Surgical Instrument Signaling (SIS) is compounded by specific hand gestures used by the communication between the surgeon and surgical instrumentator. With SIS, the surgeon executes signals representing determined instruments in order to avoid error and communication failures. This work presented the feasibility of an SIS gesture recognition system using surface electromyographic (sEMG) signals acquired from the Myo armband, aiming to build a processing routine that aids telesurgery or robotic surgery applications. Unlike other works that use up to 10 gestures to represent and classify SIS gestures, a database with 14 selected gestures for SIS was recorded from 10 volunteers, with 30 repetitions per user. Segmentation, feature extraction, feature selection, and classification were performed, and several parameters were evaluated. These steps were performed by taking into account a wearable application, for which the complexity of pattern recognition algorithms is crucial. The system was tested offline and verified as to its contribution for all databases and each volunteer individually. An automatic segmentation algorithm was applied to identify the muscle activation; thus, 13 feature sets and 6 classifiers were tested. Moreover, 2 ensemble techniques aided in separating the sEMG signals into the 14 SIS gestures. Accuracy of 76% was obtained for the Support Vector Machine classifier for all databases and 88% for analyzing the volunteers individually. The system was demonstrated to be suitable for SIS gesture recognition using sEMG signals for wearable applications.
Assuntos
Gestos , Reconhecimento Automatizado de Padrão , Humanos , Eletromiografia/métodos , Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador , Algoritmos , Instrumentos Cirúrgicos , MãosRESUMO
Pesticides have a significant negative impact on the environment, non-target organisms, and human health. To address these issues, sustainable pest management practices and government regulations are necessary. However, biotechnology can provide additional solutions, such as the use of polyelectrolyte complexes to encapsulate and remove pesticides from water sources. We introduce a computational methodology to evaluate the capture capabilities of Calcium-Alginate-Chitosan (CAC) nanoparticles for a broad range of pesticides. By employing ensemble-docking and molecular dynamics simulations, we investigate the intermolecular interactions and absorption/adsorption characteristics between the CAC nanoparticles and selected pesticides. Our findings reveal that charged pesticide molecules exhibit more than double capture rates compared to neutral counterparts, owing to their stronger affinity for the CAC nanoparticles. Non-covalent interactions, such as van der Waals forces, π-π stacking, and hydrogen bonds, are identified as key factors which stabilized the capture and physisorption of pesticides. Density profile analysis confirms the localization of pesticides adsorbed onto the surface or absorbed into the polymer matrix, depending on their chemical nature. The mobility and diffusion behavior of captured compounds within the nanoparticle matrix is assessed using mean square displacement and diffusion coefficients. Compounds with high capture levels exhibit limited mobility, indicative of effective absorption and adsorption. Intermolecular interaction analysis highlights the significance of hydrogen bonds and electrostatic interactions in the pesticide-polymer association. Notably, two promising candidates, an antibiotic derived from tetracycline and a rodenticide, demonstrate a strong affinity for CAC nanoparticles. This computational methodology offers a reliable and efficient screening approach for identifying effective pesticide capture agents, contributing to the development of eco-friendly strategies for pesticide removal.
RESUMO
Non-standard thermostatistical formalisms derived from generalizations of the Boltzmann-Gibbs entropy have attracted considerable attention recently. Among the various proposals, the one that has been most intensively studied, and most successfully applied to concrete problems in physics and other areas, is the one associated with the Sq non-additive entropies. The Sq-based thermostatistics exhibits a number of peculiar features that distinguish it from other generalizations of the Boltzmann-Gibbs theory. In particular, there is a close connection between the Sq-canonical distributions and the micro-canonical ensemble. The connection, first pointed out in 1994, has been subsequently explored by several researchers, who elaborated this facet of the Sq-thermo-statistics in a number of interesting directions. In the present work, we provide a brief review of some highlights within this line of inquiry, focusing on micro-canonical scenarios leading to Sq-canonical distributions. We consider works on the micro-canonical ensemble, including historical ones, where the Sq-canonical distributions, although present, were not identified as such, and also more resent works by researchers who explicitly investigated the Sq-micro-canonical connection.
RESUMO
In recent years, there has been growing interest in developing air pollution prediction models to reduce exposure measurement error in epidemiologic studies. However, efforts for localized, fine-scale prediction models have been predominantly focused in the United States and Europe. Furthermore, the availability of new satellite instruments such as the TROPOsopheric Monitoring Instrument (TROPOMI) provides novel opportunities for modeling efforts. We estimated daily ground-level nitrogen dioxide (NO2) concentrations in the Mexico City Metropolitan Area at 1-km2 grids from 2005 to 2019 using a four-stage approach. In stage 1 (imputation stage), we imputed missing satellite NO2 column measurements from the Ozone Monitoring Instrument (OMI) and TROPOMI using the random forest (RF) approach. In stage 2 (calibration stage), we calibrated the association of column NO2 to ground-level NO2 using ground monitors and meteorological features using RF and extreme gradient boosting (XGBoost) models. In stage 3 (prediction stage), we predicted the stage 2 model over each 1-km2 grid in our study area, then ensembled the results using a generalized additive model (GAM). In stage 4 (residual stage), we used XGBoost to model the local component at the 200-m2 scale. The cross-validated R2 of the RF and XGBoost models in stage 2 were 0.75 and 0.86 respectively, and 0.87 for the ensembled GAM. Cross-validated rootmean-squared error (RMSE) of the GAM was 3.95 µg/m3. Using novel approaches and newly available remote sensing data, our multi-stage model presented high cross-validated fits and reconstructs fine-scale NO2 estimates for further epidemiologic studies in Mexico City.
RESUMO
The National Forestry Commission of Mexico continuously monitors forest structure within the country's continental territory by the implementation of the National Forest and Soils Inventory (INFyS). Due to the challenges involved in collecting data exclusively from field surveys, there are spatial information gaps for important forest attributes. This can produce bias or increase uncertainty when generating estimates required to support forest management decisions. Our objective is to predict the spatial distribution of tree height and tree density in all Mexican forests. We performed wall-to-wall spatial predictions of both attributes in 1-km grids, using ensemble machine learning across each forest type in Mexico. Predictor variables include remote sensing imagery and other geospatial data (e.g., mean precipitation, surface temperature, canopy cover). Training data is from the 2009 to 2014 cycle (n > 26,000 sampling plots). Spatial cross validation suggested that the model had a better performance when predicting tree height r 2 = .35 [.12, .51] (mean [min, max]) than for tree density r 2 = .23 [.05, .42]. The best predictive performance when mapping tree height was for broadleaf and coniferous-broadleaf forests (model explained ~50% of variance). The best predictive performance when mapping tree density was for tropical forest (model explained ~40% of variance). Although most forests had relatively low uncertainty for tree height predictions, e.g., values <60%, arid and semiarid ecosystems had high uncertainty, e.g., values >80%. Uncertainty values for tree density predictions were >80% in most forests. The applied open science approach we present is easily replicable and scalable, thus it is helpful to assist in the decision-making and future of the National Forest and Soils Inventory. This work highlights the need for analytical tools that help us exploit the full potential of the Mexican forest inventory datasets.
RESUMO
Context: Inferring gene regulatory networks (GRN) from high-throughput gene expression data is a challenging task for which different strategies have been developed. Nevertheless, no ever-winning method exists, and each method has its advantages, intrinsic biases, and application domains. Thus, in order to analyze a dataset, users should be able to test different techniques and choose the most appropriate one. This step can be particularly difficult and time consuming, since most methods' implementations are made available independently, possibly in different programming languages. The implementation of an open-source library containing different inference methods within a common framework is expected to be a valuable toolkit for the systems biology community. Results: In this work, we introduce GReNaDIne (Gene Regulatory Network Data-driven Inference), a Python package that implements 18 machine learning data-driven gene regulatory network inference methods. It also includes eight generalist preprocessing techniques, suitable for both RNA-seq and microarray dataset analysis, as well as four normalization techniques dedicated to RNA-seq. In addition, this package implements the possibility to combine the results of different inference tools to form robust and efficient ensembles. This package has been successfully assessed under the DREAM5 challenge benchmark dataset. The open-source GReNaDIne Python package is made freely available in a dedicated GitLab repository, as well as in the official third-party software repository PyPI Python Package Index. The latest documentation on the GReNaDIne library is also available at Read the Docs, an open-source software documentation hosting platform. Contribution: The GReNaDIne tool represents a technological contribution to the field of systems biology. This package can be used to infer gene regulatory networks from high-throughput gene expression data using different algorithms within the same framework. In order to analyze their datasets, users can apply a battery of preprocessing and postprocessing tools and choose the most adapted inference method from the GReNaDIne library and even combine the output of different methods to obtain more robust results. The results format provided by GReNaDIne is compatible with well-known complementary refinement tools such as PYSCENIC.
Assuntos
Biologia Computacional , Redes Reguladoras de Genes , Biologia Computacional/métodos , São Vicente e Granadinas , Software , Expressão GênicaRESUMO
In this paper, the latest global COVID-19 pandemic prediction is addressed. Each country worldwide has faced this pandemic differently, reflected in its statistical number of confirmed and death cases. Predicting the number of confirmed and death cases could allow us to know the future number of cases and provide each country with the necessary information to make decisions based on the predictions. Recent works are focused only on confirmed COVID-19 cases or a specific country. In this work, the firefly algorithm designs an ensemble neural network architecture for each one of 26 countries. In this work, we propose the firefly algorithm for ensemble neural network optimization applied to COVID-19 time series prediction with type-2 fuzzy logic in a weighted average integration method. The proposed method finds the number of artificial neural networks needed to form an ensemble neural network and their architecture using a type-2 fuzzy inference system to combine the responses of individual artificial neural networks to perform a final prediction. The advantages of the type-2 fuzzy weighted average integration (FWA) method over the conventional average method and type-1 fuzzy weighted average integration are shown.
RESUMO
In this work, a novel multimodal learning approach for early prediction of birth weight is presented. Fetal weight is one of the most relevant indicators in the assessment of fetal health status. The aim is to predict early birth weight using multimodal maternal-fetal variables from the first trimester of gestation (Anthropometric data, as well as metrics obtained from Fetal Biometry, Doppler and Maternal Ultrasound). The proposed methodology starts with the optimal selection of a subset of multimodal features using an ensemble-based approach of feature selectors. Subsequently, the selected variables feed the nonparametric Multiple Kernel Learning regression algorithm. At this stage, a set of kernels is selected and weighted to maximize performance in birth weight prediction. The proposed methodology is validated and compared with other computational learning algorithms reported in the state of the art. The obtained results (absolute error of 234 g) suggest that the proposed methodology can be useful as a tool for the early evaluation and monitoring of fetal health status through indicators such as birth weight.
Assuntos
Feto , Cuidado Pré-Natal , Humanos , Feminino , Gravidez , Peso ao Nascer , Algoritmos , AntropometriaRESUMO
Following the recent advances in wireless communication leading to increased Internet of Things (IoT) systems, many security threats are currently ravaging IoT systems, causing harm to information. Considering the vast application areas of IoT systems, ensuring that cyberattacks are holistically detected to avoid harm is paramount. Machine learning (ML) algorithms have demonstrated high capacity in helping to mitigate attacks on IoT devices and other edge systems with reasonable accuracy. However, the dynamics of operation of intruders in IoT networks require more improved IDS models capable of detecting multiple attacks with a higher detection rate and lower computational resource requirement, which is one of the challenges of IoT systems. Many ensemble methods have been used with different ML classifiers, including decision trees and random forests, to propose IDS models for IoT environments. The boosting method is one of the approaches used to design an ensemble classifier. This paper proposes an efficient method for detecting cyberattacks and network intrusions based on boosted ML classifiers. Our proposed model is named BoostedEnML. First, we train six different ML classifiers (DT, RF, ET, LGBM, AD, and XGB) and obtain an ensemble using the stacking method and another with a majority voting approach. Two different datasets containing high-profile attacks, including distributed denial of service (DDoS), denial of service (DoS), botnets, infiltration, web attacks, heartbleed, portscan, and botnets, were used to train, evaluate, and test the IDS model. To ensure that we obtained a holistic and efficient model, we performed data balancing with synthetic minority oversampling technique (SMOTE) and adaptive synthetic (ADASYN) techniques; after that, we used stratified K-fold to split the data into training, validation, and testing sets. Based on the best two models, we construct our proposed BoostedEnsML model using LightGBM and XGBoost, as the combination of the two classifiers gives a lightweight yet efficient model, which is part of the target of this research. Experimental results show that BoostedEnsML outperformed existing ensemble models in terms of accuracy, precision, recall, F-score, and area under the curve (AUC), reaching 100% in each case on the selected datasets for multiclass classification.