j

cpo

public

308 stars

82 forks

5 issues

Commits

List of commits on branch master.

Unverified

5c83925a84bc21dfc9f8bceb82f85daf2937e595

Release

committed 8 years ago

Unverified

303bd8176b3bc46201c90c9d72296c031053d0d7

Release

committed 8 years ago

Unverified

0045fa93300b377e70996c92ff344da5bcff7fa0

Release

committed 8 years ago

Unverified

ce9afd3e4ff962f921b35935ef727f02e25929b1

Release

committed 8 years ago

README

The README file for this repository.

Constrained Policy Optimization for rllab

Constrained Policy Optimization (CPO) is an algorithm for learning policies that should satisfy behavioral constraints throughout training. [1]

This module was designed for rllab [2], and includes the implementations of

described in our paper [1].

To configure, run the following command in the root folder of rllab:

git submodule add -f https://github.com/jachiam/cpo sandbox/cpo

Run CPO in the Point-Gather environment with

python sandbox/cpo/experiments/CPO_point_gather.py

Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel. "Constrained Policy Optimization". Proceedings of the 34th International Conference on Machine Learning (ICML), 2017.
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel. "Benchmarking Deep Reinforcement Learning for Continuous Control". Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016.