GitXplorerGitXplorer
g

planet

public
1183 stars
202 forks
4 issues

Commits

List of commits on branch master.
Unverified
c04226b6db136f5269625378cd6a0aa875a92842

Make mask optional in DRNN (#43)

mmktal committed 5 years ago
Unverified
cbe77fc011299becf6c3805d6007c5bf58012f87

Sync changes.

ddanijar committed 6 years ago
Unverified
bdb1b16cb922ad483a65f27eb2382d905f99c06a

Fix batch dimension for post processed distribution (#34)

mmktal committed 6 years ago
Unverified
0c6f7d3c56fe691da5b0a2fc62db3cb7075cfcf4

Update hyper parameters to camera ready version.

ddanijar committed 6 years ago
Unverified
1896bdb34595c58c13c8fba5a6e085ab3a3aa294

Fix RSSM config flag and use Normal distributions.

ddanijar committed 6 years ago
Unverified
5bb34f92edf549db342c7d923da392d13559c79d

Add GIF summaries to TensorBoard.

ddanijar committed 6 years ago

README

The README file for this repository.

Deep Planning Network

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson

PlaNet policies and predictions

This project provides the open source implementation of the PlaNet agent introduced in Learning Latent Dynamics for Planning from Pixels. PlaNet is a purely model-based reinforcement learning algorithm that solves control tasks from images by efficient planning in a learned latent space. PlaNet competes with top model-free methods in terms of final performance and training time while using substantially less interaction with the environment.

If you find this open source release useful, please reference in your paper:

@inproceedings{hafner2019planet,
  title={Learning Latent Dynamics for Planning from Pixels},
  author={Hafner, Danijar and Lillicrap, Timothy and Fischer, Ian and Villegas, Ruben and Ha, David and Lee, Honglak and Davidson, James},
  booktitle={International Conference on Machine Learning},
  pages={2555--2565},
  year={2019}
}

Method

PlaNet model diagram

PlaNet models the world as a compact sequence of hidden states. For planning, we first encode the history of past images into the current state. From there, we efficiently predict future rewards for multiple action sequences in latent space. We execute the first action of the best sequence found and replan after observing the next image.

Find more information:

Instructions

To train an agent, install the dependencies and then run:

python3 -m planet.scripts.train --logdir /path/to/logdir --params '{tasks: [cheetah_run]}'

The code prints nan as the score for iterations during which no summaries were computed.

The available tasks are listed in scripts/tasks.py. The default parameters can be found in scripts/configs.py. To run the experiments from our paper, pass the following parameters to --params {...} in addition to the list of tasks:

Experiment Parameters
PlaNet No additional parameters.
Random data collection planner_iterations: 0, train_action_noise: 1.0
Purely deterministic mean_only: True, divergence_scale: 0.0
Purely stochastic model: ssm
One agent all tasks collect_every: 30000

Please note that the agent has seen some improvements so the results may be a bit different now.

Modifications

These are good places to start when modifying the code:

Directory Description
scripts/configs.py Add new parameters or change defaults.
scripts/tasks.py Add or modify environments.
models Add or modify latent transition models.
networks Add or modify encoder and decoder networks.

Tips for development:

  • You can set --config debug to reduce the episode length, batch size, and collect data more freqnently. This helps to quickly reach all parts of the code.
  • You can use --num_runs 1000 --resume_runs False to automatically start new runs in sub directories of the logdir every time to execute the script.
  • Environments live in separate processes by default. Some environments work better when separated into threads instead by specifying --params '{isolate_envs: thread}'.

Dependencies

The code was tested under Ubuntu 18 and uses these packages:

  • tensorflow-gpu==1.13.1
  • tensorflow_probability==0.6.0
  • dm_control (egl rendering option recommended)
  • gym
  • scikit-image
  • scipy
  • ruamel.yaml
  • matplotlib

Disclaimer: This is not an official Google product.