Policy-regularized Offline Multi-objective Reinforcement Learning

In this paper, we aim to utilize only offline trajectory data to train a policy for multi-objective RL. We extend the offline policy-regularized method, a widely-adopted approach for single-objective offline RL problems, into the multi-objective setting in order to achieve the…

Open

Year: 2024
ArXiv: arxiv.org/abs/2401.02244
URL: arxiv.org/abs/2401.02244v1
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2401.02244v1
TL;DR: Semantic Scholar

Attribution policy →