Pesquisa | Portal Regional da BVS

BoostedEnML: Efficient Technique for Detecting Cyberattacks in IoT Systems Using Boosted Ensemble Machine Learning.

Okey, Ogobuchi Daniel; Maidin, Siti Sarah; Adasme, Pablo; Lopes Rosa, Renata; Saadi, Muhammad; Carrillo Melgarejo, Dick; Zegarra Rodríguez, Demóstenes.

Sensors (Basel) ; 22(19)2022 Sep 29.

Artigo em Inglês | MEDLINE | ID: mdl-36236506

RESUMO

Following the recent advances in wireless communication leading to increased Internet of Things (IoT) systems, many security threats are currently ravaging IoT systems, causing harm to information. Considering the vast application areas of IoT systems, ensuring that cyberattacks are holistically detected to avoid harm is paramount. Machine learning (ML) algorithms have demonstrated high capacity in helping to mitigate attacks on IoT devices and other edge systems with reasonable accuracy. However, the dynamics of operation of intruders in IoT networks require more improved IDS models capable of detecting multiple attacks with a higher detection rate and lower computational resource requirement, which is one of the challenges of IoT systems. Many ensemble methods have been used with different ML classifiers, including decision trees and random forests, to propose IDS models for IoT environments. The boosting method is one of the approaches used to design an ensemble classifier. This paper proposes an efficient method for detecting cyberattacks and network intrusions based on boosted ML classifiers. Our proposed model is named BoostedEnML. First, we train six different ML classifiers (DT, RF, ET, LGBM, AD, and XGB) and obtain an ensemble using the stacking method and another with a majority voting approach. Two different datasets containing high-profile attacks, including distributed denial of service (DDoS), denial of service (DoS), botnets, infiltration, web attacks, heartbleed, portscan, and botnets, were used to train, evaluate, and test the IDS model. To ensure that we obtained a holistic and efficient model, we performed data balancing with synthetic minority oversampling technique (SMOTE) and adaptive synthetic (ADASYN) techniques; after that, we used stratified K-fold to split the data into training, validation, and testing sets. Based on the best two models, we construct our proposed BoostedEnsML model using LightGBM and XGBoost, as the combination of the two classifiers gives a lightweight yet efficient model, which is part of the target of this research. Experimental results show that BoostedEnsML outperformed existing ensemble models in terms of accuracy, precision, recall, F-score, and area under the curve (AUC), reaching 100% in each case on the selected datasets for multiclass classification.

Assuntos

Internet das Coisas , Algoritmos , Área Sob a Curva , Aprendizado de Máquina

Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico.

Palechor, Fabio Mendoza; Manotas, Alexis de la Hoz.

Data Brief ; 25: 104344, 2019 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-31467953

RESUMO

This paper presents data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition. The data contains 17 attributes and 2111 records, the records are labeled with the class variable NObesity (Obesity Level), that allows classification of the data using the values of Insufficient Weight, Normal Weight, Overweight Level I, Overweight Level II, Obesity Type I, Obesity Type II and Obesity Type III. 77% of the data was generated synthetically using the Weka tool and the SMOTE filter, 23% of the data was collected directly from users through a web platform. This data can be used to generate intelligent computational tools to identify the obesity level of an individual and to build recommender systems that monitor obesity levels. For discussion and more information of the dataset creation, please refer to the full-length article "Obesity Level Estimation Software based on Decision Trees" (De-La-Hoz-Correa et al., 2019).

Mirnacle: machine learning with SMOTE and random forest for improving selectivity in pre-miRNA ab initio prediction.

Marques, Yuri Bento; de Paiva Oliveira, Alcione; Ribeiro Vasconcelos, Ana Tereza; Cerqueira, Fabio Ribeiro.

BMC Bioinformatics ; 17(Suppl 18): 474, 2016 12 15.

Artigo em Inglês | MEDLINE | ID: mdl-28105918

RESUMO

BACKGROUND: MicroRNAs (miRNAs) are key gene expression regulators in plants and animals. Therefore, miRNAs are involved in several biological processes, making the study of these molecules one of the most relevant topics of molecular biology nowadays. However, characterizing miRNAs in vivo is still a complex task. As a consequence, in silico methods have been developed to predict miRNA loci. A common ab initio strategy to find miRNAs in genomic data is to search for sequences that can fold into the typical hairpin structure of miRNA precursors (pre-miRNAs). The current ab initio approaches, however, have selectivity issues, i.e., a high number of false positives is reported, which can lead to laborious and costly attempts to provide biological validation. This study presents an extension of the ab initio method miRNAFold, with the aim of improving selectivity through machine learning techniques, namely, random forest combined with the SMOTE procedure that copes with imbalance datasets. RESULTS: By comparing our method, termed Mirnacle, with other important approaches in the literature, we demonstrate that Mirnacle substantially improves selectivity without compromising sensitivity. For the three datasets used in our experiments, our method achieved at least 97% of sensitivity and could deliver a two-fold, 20-fold, and 6-fold increase in selectivity, respectively, compared with the best results of current computational tools. CONCLUSIONS: The extension of miRNAFold by the introduction of machine learning techniques, significantly increases selectivity in pre-miRNA ab initio prediction, which optimally contributes to advanced studies on miRNAs, as the need of biological validations is diminished. Hopefully, new research, such as studies of severe diseases caused by miRNA malfunction, will benefit from the proposed computational tool.

Assuntos

Biologia Computacional/métodos , Eucariotos/genética , Genômica/métodos , MicroRNAs/química , Animais , Biologia Computacional/instrumentação , Simulação por Computador , Eucariotos/química , Genoma , Genômica/instrumentação , Humanos , Aprendizado de Máquina , MicroRNAs/genética , Conformação de Ácido Nucleico , Plantas/química , Plantas/genética

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA