Compute-optimal large language models
Michael Scherbela, 18. Jan 2023
When training a LLM with a fixed compute-budget, a key tradeoff is how many parameters to use vs. how many tokens to process during training. This paper by DeepMind shows that historically LLMs were scaled up to quickly in parameters and insufficiently in the amount of training data.