Reinforcement learning agents typically perform poorly if provided with inputs that were not clearly defined in training. A new approach enables RL agents to perform well, even when subject to corrupt, incomplete, or shuffled inputs.
Agents with a self-attention “bottleneck” not only can solve these tasks from pixel inputs with only 4000 parameters, but they are also better at generalization.