← 返回论文库
Q-Learning
Watkins · 1989
L5.1 · Algorithmic Foundations
PhD thesis, Cambridge
#rl
CORE IDEA
Off-policy temporal difference control,value-based RL 之祖。
L-ANCHOR · 为什么在这一层重要
RL 经典
相关论文
QuantFactor REINFORCE
L0.3
2024
DeepSeek-R1: Incentivizing Reasoning in LLMs via RL
L4.2
2025
Playing Atari with Deep Reinforcement Learning (DQN)
L5.1
2013
Mastering the Game of Go with Deep Neural Networks and Tree Search (AlphaGo)
L5.1
2016