← 返回论文库

Scaling Laws for Neural Language Models

Kaplan, et al. (OpenAI) · 2020

L4.1 · Foundation Model Tech StackarXiv:2001.08361#scaling

CORE IDEA

Loss 是 (compute/data/params) 的 power law，指导 GPT-3 scale。

L-ANCHOR · 为什么在这一层重要

scaling 经典

arXiv:2001.08361 ↗

相关论文

Attention is All You Need

Training Compute-Optimal Large Language Models (Chinchilla)

DeepSeek-V3 Technical Report

DeepSeek-V4 Technical Report