The Landscape of Agentic Reinforcement Learning for LLMs

— · 2025

L5.1 · Algorithmic FoundationsarXiv:2509.02547#survey#rl#llm-agent

CORE IDEA

综述 agentic reinforcement learning for LLMs，把 LLM agent 的探索、反馈、奖励建模、环境交互和长期优化放进 RL 框架。

CONCRETE EXAMPLE

用于连接 DeepSeek-R1、o1/o3、agentic search 和 alpha mining 中的 sparse reward 问题。

L-ANCHOR · 为什么在这一层重要

L5 RL foundation survey anchor，把 LLM agent 重新接回强化学习根基。

相关论文