Path-Coupled Bellman Flows for Distributional Reinforcement Learning

Distributional reinforcement learning (DRL) models the full return distribution, but existing finite-support or quantile-based methods rely on projections, while recent flow-based approaches can suffer from boundary mismatch at the flow source or from high-variance bootstrapping when current and successor noises are independent. We propose Path-Coupled Bellman Flows (PCBF), a continuous-time DRL method that learns return distributions with flow matching using source-consistent Bellman-coupled paths: the current path starts from the required base prior at t{=}0, reaches the Bellman target at t{=}1, and maintains a pathwise affine relation to the successor flow at intermediate times (without requiring time-t marginals to satisfy a distributional Bellman fixed point for all t). PCBF couples current and successor return flows through shared base noise and uses a λ-parameterized control-variate target: λ{=}0 recovers an unbiased sample Bellman target, while λ{>}0 trades controlled bias for variance reduction. Experiments on analytically tractable MRPs, OGBench, and D4RL show improved distributional fidelity and training stability, and competitive offline RL performance.