Environments¶
The Atari Environment and a Gym Env Wrapper are included in rlpyt.
Atari¶
-
class
rlpyt.envs.atari.atari_env.
AtariTrajInfo
(**kwargs)¶ Bases:
rlpyt.samplers.collections.TrajInfo
TrajInfo class for use with Atari Env, to store raw game score separate from clipped reward signal.
-
class
rlpyt.envs.atari.atari_env.
AtariEnv
(game='pong', frame_skip=4, num_img_obs=4, clip_reward=True, episodic_lives=True, fire_on_reset=False, max_start_noops=30, repeat_action_probability=0.0, horizon=27000)¶ Bases:
rlpyt.envs.base.Env
An efficient implementation of the classic Atari RL envrionment using the Arcade Learning Environment (ALE).
- Output env_info includes:
- game_score: raw game score, separate from reward clipping.
- traj_done: special signal which signals game-over or timeout, so that sampler doesn’t reset the environment when
done==True
buttraj_done==False
, which can happen whenepisodic_lives==True
.
Always performs 2-frame max to avoid flickering (this is pretty fast).
Screen size downsampling is done by cropping two rows and then downsampling by 2x using cv2: (210, 160) –> (80, 104). Downsampling by 2x is much faster than the old scheme to (84, 84), and the (80, 104) shape is fairly convenient for convolution filter parameters which don’t cut off edges.
The action space is an IntBox for the number of actions. The observation space is an IntBox with
dtype=uint8
to save memory; conversion to float should happen inside the agent’s model’sforward()
method.(See the file for implementation details.)
Parameters: - game (str) – game name
- frame_skip (int) – frames per step (>=1)
- num_img_obs (int) – number of frames in observation (>=1)
- clip_reward (bool) – if
True
, clip reward to np.sign(reward) - episodic_lives (bool) – if
True
, outputdone=True
butenv_info[traj_done]=False
when a life is lost - max_start_noops (int) – upper limit for random number of noop actions after reset
- repeat_action_probability (0-1) – probability for sticky actions
- horizon (int) – max number of steps before timeout /
traj_done=True
-
reset
()¶ Performs hard reset of ALE game.
Gym Wrappers¶
-
class
rlpyt.envs.gym.
GymEnvWrapper
(env, act_null_value=0, obs_null_value=0, force_float32=True)¶ Gym-style wrapper for converting the Openai Gym interface to the rlpyt interface. Action and observation spaces are wrapped by rlpyt’s
GymSpaceWrapper
.Output env_info is automatically converted from a dictionary to a corresponding namedtuple, which the rlpyt sampler expects. For this to work, every key that might appear in the gym environments env_info at any step must appear at the first step after a reset, as the env_info entries will have sampler memory pre-allocated for them (so they also cannot change dtype or shape). (see EnvInfoWrapper, build_info_tuples, and info_to_nt in file or more help/details)
Warning
Unrecognized keys in env_info appearing later during use will be silently ignored.
This wrapper looks for gym’s
TimeLimit
env wrapper to see whether to add the fieldtimeout
to env info.-
step
(action)¶ Reverts the action from rlpyt format to gym format (i.e. if composite-to- dictionary spaces), steps the gym environment, converts the observation from gym to rlpyt format (i.e. if dict-to-composite), and converts the env_info from dictionary into namedtuple.
-
reset
()¶ Returns converted observation from gym env reset.
-
spaces
¶ Returns the rlpyt spaces for the wrapped env.
-
-
class
rlpyt.envs.gym.
EnvInfoWrapper
(env, info_example)¶ Gym-style environment wrapper to infill the env_info dict of every
step()
with a pre-defined set of examples, so that env_info has those fields at every step and they are made available to the algorithm in the sampler’s batch of data.-
step
(action)¶ If need be, put extra fields into the env_info dict returned. See file for function
infill_info()
for details.
-
-
rlpyt.envs.gym.
make
(*args, info_example=None, **kwargs)¶ Use as factory function for making instances of gym environment with rlpyt’s
GymEnvWrapper
, usinggym.make(*args, **kwargs)
. Ifinfo_example
is notNone
, will include theEnvInfoWrapper
.