Pesquisa | Portal Regional da BVS

Transferability of features for neural networks links to adversarial attacks and defences.

Kotyan, Shashank; Matsuki, Moe; Vargas, Danilo Vasconcellos.

PLoS One ; 17(4): e0266060, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35476838

RESUMO

The reason for the existence of adversarial samples is still barely understood. Here, we explore the transferability of learned features to Out-of-Distribution (OoD) classes. We do this by assessing neural networks' capability to encode the existing features, revealing an intriguing connection with adversarial attacks and defences. The principal idea is that, "if an algorithm learns rich features, such features should represent Out-of-Distribution classes as a combination of previously learned In-Distribution (ID) classes". This is because OoD classes usually share several regular features with ID classes, given that the features learned are general enough. We further introduce two metrics to assess the transferred features representing OoD classes. One is based on inter-cluster validation techniques, while the other captures the influence of a class over learned features. Experiments suggest that several adversarial defences decrease the attack accuracy of some attacks and improve the transferability-of-features as measured by our metrics. Experiments also reveal a relationship between the proposed metrics and adversarial attacks (a high Pearson correlation coefficient and low p-value). Further, statistical tests suggest that several adversarial defences, in general, significantly improve transferability. Our tests suggests that models having a higher transferability-of-features have generally higher robustness against adversarial attacks. Thus, the experiments suggest that the objectives of adversarial machine learning might be much closer to domain transfer learning, as previously thought.

Assuntos

Aprendizado de Máquina , Redes Neurais de Computação , Algoritmos , Correlação de Dados

Adversarial robustness assessment: Why in evaluation both L0 and L∞ attacks are necessary.

Kotyan, Shashank; Vargas, Danilo Vasconcellos.

PLoS One ; 17(4): e0265723, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35421125

RESUMO

There are different types of adversarial attacks and defences for machine learning algorithms which makes assessing the robustness of an algorithm a daunting task. Moreover, there is an intrinsic bias in these adversarial attacks and defences to make matters worse. Here, we organise the problems faced: a) Model Dependence, b) Insufficient Evaluation, c) False Adversarial Samples, and d) Perturbation Dependent Results. Based on this, we propose a model agnostic adversarial robustness assessment method based on L0 and L∞ distance-based norms and the concept of robustness levels to tackle the problems. We validate our robustness assessment on several neural network architectures (WideResNet, ResNet, AllConv, DenseNet, NIN, LeNet and CapsNet) and adversarial defences for image classification problem. The proposed robustness assessment reveals that the robustness may vary significantly depending on the metric used (i.e., L0 or L∞). Hence, the duality should be taken into account for a correct evaluation. Moreover, a mathematical derivation and a counter-example suggest that L1 and L2 metrics alone are not sufficient to avoid spurious adversarial samples. Interestingly, the threshold attack of the proposed assessment is a novel L∞ black-box adversarial method which requires even more minor perturbation than the One-Pixel Attack (only 12% of One-Pixel Attack's amount of perturbation) to achieve similar results. We further show that all current networks and defences are vulnerable at all levels of robustness, suggesting that current networks and defences are only effective against a few attacks keeping the models vulnerable to different types of attacks.

Assuntos

Aprendizado de Máquina , Redes Neurais de Computação , Algoritmos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA