Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years. A common understanding is that A2C and PPO are separate algorithms because PPO's clipped objective appears significantly different than A2C's objective. In this paper, however, we show A2C is a special case of PPO. We present theoretical justifications and pseudocode analysis to demonstrate why. To validate our claim, we conduct an empirical experiment using \texttt{Stable-baselines3}, showing A2C and PPO produce the \textit{exact} same models when other settings are controlled.
A2C is a special case of PPO
The Advantage Actor-Critic (A2C) is shown to be a special case of Proximal Policy Optimization (PPO) through theoretical and empirical validation.
- Year
- 2022
- Venue
- arXiv 2022
- Authors
- 6
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2205.09123ARXIV-DEFAULT
- TL;DR
- Semantic Scholar