Comparing attack effectiveness is done incorrectly 

Using the data provided, it is not possible to compare the efficacy of different attacks across models. Imagine we would like to decide whether LLC or ILLC was the stronger attack on the CIFAR-10 dataset. 

Superficially, I might look at the “Average” column and see that the average model accuracy under LLC is 39.4% compared to 58.7% accuracy under ILLC. While in general averages in security can be misleading, fortunately, for all models except one, LLC reduces the model accuracy more than ILLC does, often by over twenty percentage points. 

A reasonable reader might therefore conclude (incorrectly!) that LLC is the stronger attack. Why is this conclusion incorrect? The LLC attack only succeeded 134 times out of 1000 times on the baseline CIFAR-10 model. Therefore, when the paper writes that the accuracy of PGD adversarial training under LLC is 61.2% what this number means is that 38.8% of adversarial examples that are effective on the baseline model are also effective on the adversarially trained model. How the model would perform on the other 866 examples is not reported. In contrast, when the base model is evaluated on the ILLC attack, the attack succeeded on all 1000 examples. The 83.7 accuracy obtained by adversarial training is inherently incomparable to the the 61.2% value.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing attack effectiveness is done incorrectly #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Comparing attack effectiveness is done incorrectly #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions