← 返回论文库
Scaling Laws for Neural Language Models
Kaplan, et al. (OpenAI) · 2020
L4.1 · Foundation Model Tech Stack
arXiv:2001.08361
#scaling
CORE IDEA
Loss 是 (compute/data/params) 的 power law,指导 GPT-3 scale。
L-ANCHOR · 为什么在这一层重要
scaling 经典
arXiv:2001.08361 ↗
相关论文
Attention is All You Need
L4.1
2017
Training Compute-Optimal Large Language Models (Chinchilla)
L4.1
2022
DeepSeek-V3 Technical Report
L4.1
2024
DeepSeek-V4 Technical Report
L4.1
2026