We study offline reinforcement learning under Q^\star-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative Q-Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are Q^\star-realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?" We answer in the negative via an information-theoretic lower bound. To identify additional structure that enables sample-efficient offline RL under partial coverage, we introduce a general decision-estimation framework, inspired by model-free decision-estimation coefficients (DEC) for online RL (Foster et al., 2023b; Liu et al., 2025b). Our framework decomposes offline RL complexity into decision complexity and value estimation error. This allows modular study of both sub-problems. Our result not only unifies existing results (Chen and Jiang, 2022; Uehara et al., 2023), but further improves and generalizes them. On the decision complexity side, our improvement includes: the first ε^{-2} sample complexity bound for soft Q-learning under partial coverage that improves Uehara et al.'s (2023) ε^{-4} bound, the removal of the need for additional online interaction in the value-gap setting of Chen and Jiang (2022), and new learnable settings beyond the above two cases. On the value estimation side, we provide a new characterization of the role of Bellman completeness under partial coverage, and the first characterization of offline learnability for general low-Bellman-rank MDPs (Jiang et al., 2017; Du et al., 2021; Jin et al., 2021). The latter is a canonical online RL setting that has remained unexplored in offline RL except for special cases. As a side contribution, our techniques give the first analysis of CQL in the function approximation setting.
On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage
We study offline reinforcement learning under $Q^\star$-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative $Q$-Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention.
- Preview

- Year
- 2026
- Hosting
- Full text hostedCC-BY-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2602.12107CC-BY-4.0
- TL;DR
- Semantic Scholar