Pesquisa | Portal Regional da BVS

Performance counter dataset for behavioural biometric purpose.

Andrade, Cesar; Bragança, Hendrio; Feitosa, Eduardo; Souto, Eduardo.

Data Brief ; 52: 109999, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38226035

RESUMO

In the pursuit of advancing research in continuous user authentication, we introduce COUNT-OS-I and COUNT-OS-II, two distinct performance counter datasets from Windows operating systems, crafted to bolster research in continuous user authentication. Encompassing data from 63 computers and users, the datasets offer rich, real-world insights for developing and evaluating authentication models. COUNT-OS-I spans 26 users in an IT department, capturing 159 attributes across diverse hardware and software environments over 26 h on average per user. COUNT-OS-II, on the other hand, encompasses 37 users with identical system configurations, recording 218 attributes per sample over a 48-hour period. Both datasets utilize pseudonymization to safeguard user identities while maintaining data integrity and statistical accuracy. The well-balanced nature of the data, confirmed by comprehensive statistical analysis, positions these datasets as reliable benchmarks for the continuous user authentication domain. Through their release, we aim to empower the development of robust, real-world applicable authentication models, contributing to enhanced system security and user trust.

Android malware detection with MH-100K: An innovative dataset for advanced research.

Bragança, Hendrio; Rocha, Vanderson; Barcellos, Lucas; Souto, Eduardo; Kreutz, Diego; Feitosa, Eduardo.

Data Brief ; 51: 109750, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38020437

RESUMO

High-quality datasets are crucial for building realistic and high-performance supervised malware detection models. Currently, one of the major challenges of machine learning-based solutions is the scarcity of datasets that are both representative and of high quality. To foster future research and provide updated and public data for comprehensive evaluation and comparison of existing classifiers, we introduce the MH-100K dataset [1], an extensive collection of Android malware information comprising 101,975 samples. It encompasses a main CSV file with valuable metadata, including the SHA256 hash (APK's signature), file name, package name, Android's official compilation API, 166 permissions, 24,417 API calls, and 250 intents. Moreover, the MH-100K dataset features an extensive collection of files containing useful metadata of the VirusTotal1 analysis. This repository of information can serve future research by enabling the analysis of antivirus scan result patterns to discern the prevalence and behaviour of various malware families. Such analysis can help to extend existing malware taxonomies, the identification of novel variants, and the exploration of malware evolution over time.

How Validation Methodology Influences Human Activity Recognition Mobile Systems.

Bragança, Hendrio; Colonna, Juan G; Oliveira, Horácio A B F; Souto, Eduardo.

Sensors (Basel) ; 22(6)2022 Mar 18.

Artigo em Inglês | MEDLINE | ID: mdl-35336529

RESUMO

In this article, we introduce explainable methods to understand how Human Activity Recognition (HAR) mobile systems perform based on the chosen validation strategies. Our results introduce a new way to discover potential bias problems that overestimate the prediction accuracy of an algorithm because of the inappropriate choice of validation methodology. We show how the SHAP (Shapley additive explanations) framework, used in literature to explain the predictions of any machine learning model, presents itself as a tool that can provide graphical insights into how human activity recognition models achieve their results. Now it is possible to analyze which features are important to a HAR system in each validation methodology in a simplified way. We not only demonstrate that the validation procedure k-folds cross-validation (k-CV), used in most works to evaluate the expected error in a HAR system, can overestimate by about 13% the prediction accuracy in three public datasets but also choose a different feature set when compared with the universal model. Combining explainable methods with machine learning algorithms has the potential to help new researchers look inside the decisions of the machine learning algorithms, avoiding most times the overestimation of prediction accuracy, understanding relations between features, and finding bias before deploying the system in real-world scenarios.

Assuntos

Atividades Humanas , Aprendizado de Máquina , Algoritmos , Humanos

A Smartphone Lightweight Method for Human Activity Recognition Based on Information Theory.

Bragança, Hendrio; Colonna, Juan G; Lima, Wesllen Sousa; Souto, Eduardo.

Sensors (Basel) ; 20(7)2020 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-32230830

RESUMO

Smartphones have emerged as a revolutionary technology for monitoring everyday life, and they have played an important role in Human Activity Recognition (HAR) due to its ubiquity. The sensors embedded in these devices allows recognizing human behaviors using machine learning techniques. However, not all solutions are feasible for implementation in smartphones, mainly because of its high computational cost. In this context, the proposed method, called HAR-SR, introduces information theory quantifiers as new features extracted from sensors data to create simple activity classification models, increasing in this way the efficiency in terms of computational cost. Three public databases (SHOAIB, UCI, WISDM) are used in the evaluation process. The results have shown that HAR-SR can classify activities with 93% accuracy when using a leave-one-subject-out cross-validation procedure (LOSO).

Assuntos

Atividades Humanas , Teoria da Informação , Aprendizado de Máquina , Monitorização Fisiológica , Acelerometria , Algoritmos , Bases de Dados Factuais , Humanos , Smartphone

Human Activity Recognition Based on Symbolic Representation Algorithms for Inertial Sensors.

Sousa Lima, Wesllen; de Souza Bragança, Hendrio L; Montero Quispe, Kevin G; Pereira Souto, Eduardo J.

Sensors (Basel) ; 18(11)2018 Nov 20.

Artigo em Inglês | MEDLINE | ID: mdl-30463336

RESUMO

Mobile sensing has allowed the emergence of a variety of solutions related to the monitoring and recognition of human activities (HAR). Such solutions have been implemented in smartphones for the purpose of better understanding human behavior. However, such solutions still suffer from the limitations of the computing resources found on smartphones. In this sense, the HAR area has focused on the development of solutions of low computational cost. In general, the strategies used in the solutions are based on shallow and deep learning algorithms. The problem is that not all of these strategies are feasible for implementation in smartphones due to the high computational cost required, mainly, by the steps of data preparation and the training of classification models. In this context, this article evaluates a new set of alternative strategies based on Symbolic Aggregate Approximation (SAX) and Symbolic Fourier Approximation (SFA) algorithms with the purpose of developing solutions with low computational cost in terms of memory and processing. In addition, this article also evaluates some classification algorithms adapted to manipulate symbolic data, such as SAX-VSM, BOSS, BOSS-VS and WEASEL. Experiments were performed on the UCI-HAR, SHOAIB and WISDM databases commonly used in the literature to validate HAR solutions based on smartphones. The results show that the symbolic representation algorithms are faster in the feature extraction phase, on average, by 84.81%, and reduce the consumption of memory space, on average, by 94.48%, and they have accuracy rates equivalent to conventional algorithms.

Assuntos

Algoritmos , Atividades Humanas , Adulto , Bases de Dados Factuais , Exercício Físico , Humanos , Masculino , Pessoa de Meia-Idade , Postura Sentada , Smartphone , Caminhada , Adulto Jovem

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA