The Atari Environment and a Gym Env Wrapper are included in rlpyt.
TrajInfo class for use with Atari Env, to store raw game score separate from clipped reward signal.
AtariEnv(game='pong', frame_skip=4, num_img_obs=4, clip_reward=True, episodic_lives=True, fire_on_reset=False, max_start_noops=30, repeat_action_probability=0.0, horizon=27000)¶
An efficient implementation of the classic Atari RL envrionment using the Arcade Learning Environment (ALE).
- Output env_info includes:
- game_score: raw game score, separate from reward clipping.
- traj_done: special signal which signals game-over or timeout, so that sampler doesn’t reset the environment when
traj_done==False, which can happen when
Always performs 2-frame max to avoid flickering (this is pretty fast).
Screen size downsampling is done by cropping two rows and then downsampling by 2x using cv2: (210, 160) –> (80, 104). Downsampling by 2x is much faster than the old scheme to (84, 84), and the (80, 104) shape is fairly convenient for convolution filter parameters which don’t cut off edges.
The action space is an IntBox for the number of actions. The observation space is an IntBox with
dtype=uint8to save memory; conversion to float should happen inside the agent’s model’s
(See the file for implementation details.)
- game (str) – game name
- frame_skip (int) – frames per step (>=1)
- num_img_obs (int) – number of frames in observation (>=1)
- clip_reward (bool) – if
True, clip reward to np.sign(reward)
- episodic_lives (bool) – if
env_info[traj_done]=Falsewhen a life is lost
- max_start_noops (int) – upper limit for random number of noop actions after reset
- repeat_action_probability (0-1) – probability for sticky actions
- horizon (int) – max number of steps before timeout /
Performs hard reset of ALE game.
GymEnvWrapper(env, act_null_value=0, obs_null_value=0, force_float32=True)¶
Gym-style wrapper for converting the Openai Gym interface to the rlpyt interface. Action and observation spaces are wrapped by rlpyt’s
Output env_info is automatically converted from a dictionary to a corresponding namedtuple, which the rlpyt sampler expects. For this to work, every key that might appear in the gym environments env_info at any step must appear at the first step after a reset, as the env_info entries will have sampler memory pre-allocated for them (so they also cannot change dtype or shape). (see EnvInfoWrapper, build_info_tuples, and info_to_nt in file or more help/details)
Unrecognized keys in env_info appearing later during use will be silently ignored.
This wrapper looks for gym’s
TimeLimitenv wrapper to see whether to add the field
timeoutto env info.
Reverts the action from rlpyt format to gym format (i.e. if composite-to- dictionary spaces), steps the gym environment, converts the observation from gym to rlpyt format (i.e. if dict-to-composite), and converts the env_info from dictionary into namedtuple.
Returns converted observation from gym env reset.
Returns the rlpyt spaces for the wrapped env.
Gym-style environment wrapper to infill the env_info dict of every
step()with a pre-defined set of examples, so that env_info has those fields at every step and they are made available to the algorithm in the sampler’s batch of data.
If need be, put extra fields into the env_info dict returned. See file for function
make(*args, info_example=None, **kwargs)¶
Use as factory function for making instances of gym environment with rlpyt’s
gym.make(*args, **kwargs). If
None, will include the