← 返回论文库
Attention is All You Need
Vaswani, et al. (Google) · 2017
L4.1 · Foundation Model Tech Stack
NeurIPS 2017
#architecture
CORE IDEA
Transformer architecture:self-attention 替代 RNN,所有现代 LLM 的根基。
L-ANCHOR · 为什么在这一层重要
架构起点
arXiv:1706.03762 ↗
相关论文
Scaling Laws for Neural Language Models
L4.1
2020
Training Compute-Optimal Large Language Models (Chinchilla)
L4.1
2022
DeepSeek-V3 Technical Report
L4.1
2024
DeepSeek-V4 Technical Report
L4.1
2026