Compute-optimal large language models

Michael Scherbela, 18. Jan 2023

When training a LLM with a fixed compute-budget, a key tradeoff is how many parameters to use vs. how many tokens to process during training. This paper by DeepMind shows that historically LLMs were scaled up to quickly in parameters and insufficiently in the amount of training data.

https://arxiv.org/abs/2203.15556.pdf