Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization

Proactive large language model (LLM) agents aim to actively plan, query, and interact over multiple turns, enabling efficient task completion beyond passive instruction following and making them essential for real-world, user-centric applications. Agentic reinforcement learning (RL) has recently emerged as a promising solution for training such agents in multi-turn settings, allowing them to learn long-horizon decision-making strategies. However, existing pipelines face a critical challenge in balancing task performance with user engagement, as passive agents cannot efficiently adapt to users' intentions while overuse of human feedback increases the burden on users, which forms a Pareto Frontier between these two objectives. To push forward this frontier, we propose Behavior Agentic Optimization (BAO), an agentic RL framework that enhances and regularizes inter-turn behaviors to improve information-gathering capabilities and suppress inefficient or redundant interactions with users. We evaluate BAO on multiple tasks from the UserRL benchmark suite and demonstrate that it substantially outperforms proactive agentic RL baselines in terms of both higher task performance and lower user efforts, while achieving comparable or even superior performance to commercial LLM agents, highlighting its effectiveness for training proactive, user-centric LLM agents in complex multi-turn scenarios. Our website: https://proactive-agentic-rl.github.io/.