The data on which the big transformer was trained was generated from independently trained agents, was it not? There is no actual RL being done to train the final agent, it's all amortizing other agents, and each of the RL-trained agents was trained on a single environment.
— David Pfau (@pfau) May 13, 2022