Why poisoning attacks are not solved by network generalizing?
  -Jacob: influence is d/n, so only need 1/d points
  -Nati: this isn't true, influence is 1/n (0-1 loss)
  -Jacob: this assumes 0-1 loss, but not true for hinge loss
  -Percy: attacker can go after specific test inputs
  -Alex: example: getting past passport check
  -Jacob: agrees with Nati that good 0-1 loss on train => good 0-1 loss on test
  -Jacob & Nati both agree that:
    1. good 0-1 loss on train => good 0-1 loss on test
    2. it's at least possible for an adversary to make training hard
  -two possibilities for #2:
    -add data to fool a specific algorithm
    -or make computatoinally hard for any algorithm
    -editorial note: both of these have been shown possibly, see work of Servedio & Long for the first, and by Vitaly Feldman for second
Dan: two issues, let's focus on one:
  -targeted attacks
  -causing training algorithm to fail
Possible test case:
  -can we cause a specific person's face to be mis-classified?
  -note: Dawn Song's group is working on this
  -formal question: suppose we have k adversarial data points, k' people's faces we want to mess up, how big does k need to be relative to k'?
  -when k' = 1, seems like k can be ~5 and still be successful
  -possible defenses: some sort of pre-filtering?
  -Percy: this seems hard in NLP tasks with lots of rare words, etc.
  -possible solution that was brought up: go into some embedding space that gets rid of sparsity issues
  Alex: maybe some attacks are too strong, we need to curtail our expectations
    -e.g. in rare words case, just drop out rare words
    -editorial note: it would be really interesting if we could do this automatically rather than having to do it by hand
  Stephen Wright:
    -does sanitization for astronomical data
    -issue is that sanitization might throw out good data as well
    -but here at least you know that something was thrown out, rather than failing silently
  Tara Javldi: what about different possible output formats?
    -e.g. being able to say "don't know" or list-decoding
  Martin Abadi: giving some measure of support of the prediction
    -Jacob: is uncertain you can do this in a meaningful way
  Percy: maybe there's a systems solution to this
    -things people suggested here: having a measure of reputation for accounts / data sources / etc.
      -maybe a good abstraction is something like: X% highly trustworthy, Y% not clearly trustworthy, Z% fully adversarial [how to define middle class?]
  Tudor: is it realistic to assume that adversary controls the label?
    -two settings: (1) label always correct [labeled by hand], (2) label created automatically
    -in case (1), it's something like adversarial distributional shift