GitXplorerGitXplorer
c

breaking_defensive_distillation

public
27 stars
9 forks
0 issues

Commits

List of commits on branch master.
Unverified
c8e27c96af2a789500e2e1cb80af03c947c9518a

Update to TF1.0

ccarlini committed 8 years ago
Unverified
4b413dea2e45bb27143fe031dde88d44f91c2d1f

Update README with warning

ccarlini committed 8 years ago
Unverified
2aa48f9632f5de92172802ee71aaef6a785dda53

Initial commit

ccarlini committed 9 years ago

README

The README file for this repository.

Update: this repository is out of date. It contains strictly less useful code than the repository at the following URL:

https://github.com/carlini/nn_robust_attacks

In particular, do not use the l0 attack in this repository; it is only good at breaking defensive distillation (not other attacks).


Defensive Distillation was recently proposed as a defense to adversarial examples.

Unfortunately, distillation is not secure. We show this in our paper, at http://nicholas.carlini.com/papers/2016_defensivedistillation.pdf We strongly believe that research should be reproducible, and so our releasing the code required to train a baseline model on MNIST, train a defensively distilled model on MNIST, and attack the defensively distilled model.

To run the code, you will need Python 3.x with TensorFlow. It will be slow unless you have a GPU to train on.

Begin by running train_baseline.py and train_distillation.py; that will create three model files, two of which are useful. They should report final accuracy around 99.3% +/-0.2%.

To construct adversarial examples, run l0_attack.py passing as argument either models/baseline models/distilled. This will run the modified l0 adversary on the given model. The success probability should be ~95% modifying ~35 pixels.