Welcome to rlpyt’s documentation!¶
rlpyt includes modular, optimized implementations of common deep RL algorithms in PyTorch, with unified infrastructure supporting all three major families of model-free algorithms: policy gradient, deep-q learning, and q-function policy gradient. It is intended to be a high-throughput code-base for small- to medium-scale research (large-scale meaning like OpenAI Dota with 100’s GPUs). A conceptual overview is provided in the white paper, and the code (with examples) in the github repository.
This documentation aims to explain the intent of the code structure, to make it easier to use and modify (it might not detail every keyword argument as in a fixed library). See the github README for installation instructions and other introductory notes. Please share any questions or comments to do with documenantation on the github issues.
The sections are organized as follows. First, several of the base classes are introduced. Then, each algorithm family and associated agents and models are grouped together. Infrastructure code such as the runner classes and sampler classes are covered next. All the remaining components are covered thereafter, in no particular order.
- Base Classes and Interfaces
- Policy Gradient Implementations
- Deep Q-Learning Implementations
- Q-Value Policy Gradient Implementations
- Asynchronous Samplers
- Model Components
- Replay Buffers
- Named Array Tuples
- Creating and Launching Experiments