RL is under explored than it should be.

Observations?

Way to give report on model converging to be resilient to adversarial examples? This seems to be a general question.

Different attack surfaces: various inputs and several components that can be attacked.

How the policy is used to decide the action? Statically based on the end state or continuously based on current states during the process? 

To what extend to different issues?

In RL, there is a big gap between practicality and provabillty.

Difference between RL and image classification. The attacker can inject adversarial perturbation into every frame, but the attacker can also be smarter to inject adversarial perturbations sparsely.

Some actions may cause long-standing (irreversible) consequences.

Research question: what would be the threat model?

Offline vs online setting? What’s the difference to adversarial examples if the agent does exploration or not.

In RL, there is a reward. There is an algorithm to optimize for long-term reward. This is different than traditional image classification.

Whether to have models? Many robots are just doing rollouts.

Long-time impact versus irreversible impact: even in some cases you have the chance to fix the model aftermath, the impact that has already been made cannot be recovered.

Adaptive adversary.

If we cannot make progress on the worst-case, can we slow down the attacks? Delays the effect of adversarial examples.

Many discussions are devoted to developing a game-theoretic model for adversarial RL. This seems hard.

Summary:

We identify the difference between standard deep learning (e.g., image classification) and RL (highlighted above).

Some adversarial examples may have irreversible catastrophic impacts (worst case), while some may have long-term impacts than have a chance to fix aftermath (average case).

If we cannot make progress on solving the worst case scenario for now, we can start with the average cases.