← 返回论文库

Training Compute-Optimal Large Language Models (Chinchilla)

Hoffmann, et al. (DeepMind) · 2022

L4.1 · Foundation Model Tech StackNeurIPS 2022#scaling

CORE IDEA

修正 Kaplan：data 应该和 params 等比例扩，Chinchilla 70B beat GPT-3 175B。

L-ANCHOR · 为什么在这一层重要

现代 LLM scaling 规则

arXiv:2203.15556 ↗

相关论文

Attention is All You Need

Scaling Laws for Neural Language Models

DeepSeek-V3 Technical Report

DeepSeek-V4 Technical Report