Distributional Offline Policy Evaluation with Predictive Error Guarantees

We study the problem of estimating the distribution of the return of a policy using an offline dataset that is not generated from the policy, i.e., distributional offline policy evaluation (OPE).

Open

Year: 2023
Venue: arXiv 2023
ArXiv: arxiv.org/abs/2302.09456
URL: arxiv.org/abs/2302.09456v3
Authors: 3
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2302.09456v3
TL;DR: Semantic Scholar

Attribution policy →

Authors

Wen Sun Masatoshi Uehara Runzhe Wu