In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.
Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning
A study reveals that the perceived poor performance of low discount factors in reinforcement learning is due to heterogeneous action-gaps across the state-space, and introduces a logarithmic mapping method to homogenize these gaps, enabling lower discount factors.
- Year
- 2019
- Venue
- using-a-logarithmic-mapping-to-enable-lower
- Authors
- 3
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/1906.00572v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar