← 返回论文库
DeepSeek-V3 Technical Report
DeepSeek · 2024
L4.1 · Foundation Model Tech Stack
arXiv:2412.19437
#llm
#moe
CORE IDEA
671B/37B active MoE + MLA + MTP + FP8 训练,beat GPT-4 at 1/10 cost。
L-ANCHOR · 为什么在这一层重要
open-source MoE frontier
arXiv:2412.19437 ↗
源码 ↗
相关论文
Attention is All You Need
L4.1
2017
Scaling Laws for Neural Language Models
L4.1
2020
Training Compute-Optimal Large Language Models (Chinchilla)
L4.1
2022
DeepSeek-V4 Technical Report
L4.1
2026