So far two approaches:
resuming talks: experimental approaches (computing examples given distances metric)
formal proves: local definition, showing for one point

what should we do

common benchmarks needed
experimental paper: ran attack against standard models with standard models
take into account overfitting (when just using one one)
maybe even agreeing on MNIST being improper benchmark
Imagenet too hard? too much computations
other datasets: svhn, subsampling imagenet (problematic?),
problem with benchmark: outdated, not representative? Rather coming up with a model then a benchmark dataset
-> images are easily interpretable - other data is not? However new challenge is important
maybe overfitting problems in security due to low false negative rate (example DREBIN)
-> lack of knowledge, thinking that high accuracy is required

danger of using benchmarks, however use benchmark as a sanity check
maybe as area matures, standards will go off. Maybe shutting of new ideas when demnading on benchmark

-> important is actually to share code to ensure reproducibility
-> problem, next idea might be given away?
define why benchmark is good or bad for it to make sense, thus provide context
--> meta benchmark, e.g. saying which ones not to use?
in Computer vision: CIFAR - > imagenet -> generalizes?

norm distances for images do not mean anything for colored images (according to cv people)
rather: how difficult is it to find an adversarial example. Still distance is artificial measure for this
sometimes does sometimes matter, sometimes not


how can we bridge experiments and theory?
theory assumes strong adversary, however what are realistic assumptions
assuming to know real assumptions? attacker selects labels?
-> however get worst cases (fine for defenses)
state goals, assumptions capabilities of assumptions
precise descriptions for assumptions
many fuzzy definitions, for example concerning loss

limit attacker capabilities, then argue. Problematic in reality
be clear about mathematical assumptions
be less limited in the data and the context (maybe even open world considered)
however consent that reality has to represented by data, however move towards it

different attacker for different settings in reality
however, attacker does not know all features (however security by obscurity)
attacker may not know algorithm and hyper-parameters
-> stealing models actually
problem with nondeterminism and reproduceability
-> black and whitebox settings

actually worry:
what is loss function is a black-box (mathematical foundations, more knowledge)
non natural assumptions, you cannot change every pixel. assumption is thus unrealistic.

maybe have a best-practice document
figure out potential use-cases
developing a (ideal) dataset: what are the features, properties it has to have
-> should be an dataset that we understand
-> linked to adversary, to its goals and capabilities
-> specific/general attack
-> more open dataset, maybe several for different aspects of attacker and data
-> for example: multi-modal
-> should however be easily interpretable, such as image data
-> reference implementations for this dataset

there are some datasets/classifier (windows malware), however not public


needed properties:
-trainable in a days