Reinforcement Learning Foundations for Deep Research Systems

— · 2025

L1 · Domain Research AgentsarXiv:2509.06733#survey#deep-research#rl

CORE IDEA

把 deep research systems 放回 RL/decision-making 视角，讨论任务分解、探索、反馈、奖励和长期研究轨迹优化。

CONCRETE EXAMPLE

用于解释为什么 research agent 不只是检索和写报告，而是一个带长期 credit assignment 的探索系统。

L-ANCHOR · 为什么在这一层重要

L1-L5 bridge anchor，把 research-agent workflow 与 RL foundations 连接起来。

相关论文