Environments¶

The Atari Environment and a Gym Env Wrapper are included in rlpyt.

Atari¶

class rlpyt.envs.atari.atari_env.AtariTrajInfo(**kwargs)¶

Bases: rlpyt.samplers.collections.TrajInfo

TrajInfo class for use with Atari Env, to store raw game score separate from clipped reward signal.

class rlpyt.envs.atari.atari_env.AtariEnv(game='pong', frame_skip=4, num_img_obs=4, clip_reward=True, episodic_lives=True, fire_on_reset=False, max_start_noops=30, repeat_action_probability=0.0, horizon=27000)¶

Bases: rlpyt.envs.base.Env

An efficient implementation of the classic Atari RL envrionment using the Arcade Learning Environment (ALE).

Output env_info includes:

game_score: raw game score, separate from reward clipping.
traj_done: special signal which signals game-over or timeout, so that sampler doesn’t reset the environment when done==True but traj_done==False, which can happen when episodic_lives==True.

Always performs 2-frame max to avoid flickering (this is pretty fast).

Screen size downsampling is done by cropping two rows and then downsampling by 2x using cv2: (210, 160) –> (80, 104). Downsampling by 2x is much faster than the old scheme to (84, 84), and the (80, 104) shape is fairly convenient for convolution filter parameters which don’t cut off edges.

The action space is an IntBox for the number of actions. The observation space is an IntBox with dtype=uint8 to save memory; conversion to float should happen inside the agent’s model’s forward() method.

(See the file for implementation details.)

Parameters:

game (str) – game name
frame_skip (int) – frames per step (>=1)
num_img_obs (int) – number of frames in observation (>=1)
clip_reward (bool) – if True, clip reward to np.sign(reward)
episodic_lives (bool) – if True, output done=True but env_info[traj_done]=False when a life is lost
max_start_noops (int) – upper limit for random number of noop actions after reset
repeat_action_probability (0-1) – probability for sticky actions
horizon (int) – max number of steps before timeout / traj_done=True

reset()¶: Performs hard reset of ALE game.

Gym Wrappers¶

class rlpyt.envs.gym.GymEnvWrapper(env, act_null_value=0, obs_null_value=0, force_float32=True)¶

Gym-style wrapper for converting the Openai Gym interface to the rlpyt interface. Action and observation spaces are wrapped by rlpyt’s GymSpaceWrapper.

Output env_info is automatically converted from a dictionary to a corresponding namedtuple, which the rlpyt sampler expects. For this to work, every key that might appear in the gym environments env_info at any step must appear at the first step after a reset, as the env_info entries will have sampler memory pre-allocated for them (so they also cannot change dtype or shape). (see EnvInfoWrapper, build_info_tuples, and info_to_nt in file or more help/details)

Warning

Unrecognized keys in env_info appearing later during use will be silently ignored.

This wrapper looks for gym’s TimeLimit env wrapper to see whether to add the field timeout to env info.

step(action)¶: Reverts the action from rlpyt format to gym format (i.e. if composite-to- dictionary spaces), steps the gym environment, converts the observation from gym to rlpyt format (i.e. if dict-to-composite), and converts the env_info from dictionary into namedtuple.

reset()¶: Returns converted observation from gym env reset.

spaces¶: Returns the rlpyt spaces for the wrapped env.

class rlpyt.envs.gym.EnvInfoWrapper(env, info_example)¶

Gym-style environment wrapper to infill the env_info dict of every step() with a pre-defined set of examples, so that env_info has those fields at every step and they are made available to the algorithm in the sampler’s batch of data.

step(action)¶: If need be, put extra fields into the env_info dict returned. See file for function infill_info() for details.

rlpyt.envs.gym.make(*args, info_example=None, **kwargs)¶: Use as factory function for making instances of gym environment with rlpyt’s GymEnvWrapper, using gym.make(*args, **kwargs). If info_example is not None, will include the EnvInfoWrapper.