Large Language Models are notorious for being expensive to train, but provide a model that can be evaluated on generalized language understanding benchmarks. What if the goal is to perform well on a task-specific benchmark instead? Can we cut down the costs of pre-training? (1/9)
