So far two approaches: resuming talks: experimental approaches (computing examples given distances metric) formal proves: local definition, showing for one point what should we do common benchmarks needed experimental paper: ran attack against standard models with standard models take into account overfitting (when just using one one) maybe even agreeing on MNIST being improper benchmark Imagenet too hard? too much computations other datasets: svhn, subsampling imagenet (problematic?), problem with benchmark: outdated, not representative? Rather coming up with a model then a benchmark dataset -> images are easily interpretable - other data is not? However new challenge is important maybe overfitting problems in security due to low false negative rate (example DREBIN) -> lack of knowledge, thinking that high accuracy is required danger of using benchmarks, however use benchmark as a sanity check maybe as area matures, standards will go off. Maybe shutting of new ideas when demnading on benchmark -> important is actually to share code to ensure reproducibility -> problem, next idea might be given away? define why benchmark is good or bad for it to make sense, thus provide context --> meta benchmark, e.g. saying which ones not to use? in Computer vision: CIFAR - > imagenet -> generalizes? norm distances for images do not mean anything for colored images (according to cv people) rather: how difficult is it to find an adversarial example. Still distance is artificial measure for this sometimes does sometimes matter, sometimes not how can we bridge experiments and theory? theory assumes strong adversary, however what are realistic assumptions assuming to know real assumptions? attacker selects labels? -> however get worst cases (fine for defenses) state goals, assumptions capabilities of assumptions precise descriptions for assumptions many fuzzy definitions, for example concerning loss limit attacker capabilities, then argue. Problematic in reality be clear about mathematical assumptions be less limited in the data and the context (maybe even open world considered) however consent that reality has to represented by data, however move towards it different attacker for different settings in reality however, attacker does not know all features (however security by obscurity) attacker may not know algorithm and hyper-parameters -> stealing models actually problem with nondeterminism and reproduceability -> black and whitebox settings actually worry: what is loss function is a black-box (mathematical foundations, more knowledge) non natural assumptions, you cannot change every pixel. assumption is thus unrealistic. maybe have a best-practice document figure out potential use-cases developing a (ideal) dataset: what are the features, properties it has to have -> should be an dataset that we understand -> linked to adversary, to its goals and capabilities -> specific/general attack -> more open dataset, maybe several for different aspects of attacker and data -> for example: multi-modal -> should however be easily interpretable, such as image data -> reference implementations for this dataset there are some datasets/classifier (windows malware), however not public needed properties: -trainable in a days