Policy-regularized Offline Multi-objective Reinforcement Learning
In this paper, we aim to utilize only offline trajectory data to train a policy for multi-objective RL. We extend the offline policy-regularized method, a widely-adopted approach for single-objective offline RL problems, into the multi-objective setting in order to achieve the…
- Year
- 2024
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.