A2C is a special case of PPO

See our technical paper here: https://arxiv.org/abs/2205.09123

We can match PPO and A2C's performance exactly by doing the following tweaks in PPO:

Match the learning rate parameter to be exactly $0.0007$ (also means turning off learning rate annealing), the entropy coefficient to $0$, and the number of steps to be $5$.
Turn off advantage normalization
Disable GAE by setting its lambda parameter to 1.
Set the number of update epochs $K$ to 1, so the clipped objective has nothing to clip.
Perform update on the whole batch of training data (batch_size = n_envs * n_steps)
Disable value function clipping.
Use A2C's RMSprop optimizer and configurations

To see it in action, run

poetry install
poetry run python sb3_ppo.py
poetry run python sb3_a2c.py

we get the following screenshot, which shows the sum of the updated models' first layer's weights and they are exactly the same

Therefore, A2C is a special case of PPO when PPO 1) uses learning rate $=0.0007$ and turn off learning rate annealing, 2) set entropy coefficient $=0$, 3) set number of steps $=5$, 4) turn off advantage normalization, 5) disable GAE, 6) set update epoch $K=1$, 7) use whole batch of data for update, 8) disable value function clipping, and 9) use the RMSprop optimizer.

Citation

@misc{https://doi.org/10.48550/arxiv.2205.09123,
  doi = {10.48550/ARXIV.2205.09123},
  url = {https://arxiv.org/abs/2205.09123},
  author = {Huang, Shengyi and Kanervisto, Anssi and Raffin, Antonin and Wang, Weixun and Ontañón, Santiago and Dossa, Rousslan Fernand Julien},
  keywords = {Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {A2C is a special case of PPO},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

a2c_is_a_special_case_of_ppo

Commits

Update README.md

Update SB3 dependency (#1)

Update source code and dependency

Update README.md

Update README.md

Update README.md

README

A2C is a special case of PPO

Citation