vwxyzjn
RLHF @allenai, CS Ph.D. from Drexel University in RL.
Repositories
Select a repository to view its commits, contributors, and more.cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
portwarden
Create Encrypted Backups of Your Bitwarden Vault with Attachments
lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
invalid-action-masking
Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
summarize_from_feedback_details
cleanba
CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
PPO-Implementation-Deep-Dive
DEPRECATED - please visit https://github.com/vwxyzjn/ppo-implementation-details
gym-microrts-paper
The source code for the gym-microrts paper.
a2c_is_a_special_case_of_ppo
A2C is a special case of PPO!
jupyter_disqus
Add Disqus to your Jupyter notebook.
SC2AI
Integrated Tensorforce and OpenAI Gym to train SC II game agents.