← 返回论文库

Attention is All You Need

Vaswani, et al. (Google) · 2017

L4.1 · Foundation Model Tech StackNeurIPS 2017#architecture

CORE IDEA

Transformer architecture：self-attention 替代 RNN，所有现代 LLM 的根基。

L-ANCHOR · 为什么在这一层重要

架构起点

arXiv:1706.03762 ↗

相关论文

Scaling Laws for Neural Language Models

Training Compute-Optimal Large Language Models (Chinchilla)

DeepSeek-V3 Technical Report

DeepSeek-V4 Technical Report