Utilities

Here are listed number of miscellaneous utilities used in rlpyt.

Array

Miscellaneous functions for manipulating numpy arrays.

rlpyt.utils.array.select_at_indexes(indexes, array)

Returns the contents of array at the multi-dimensional integer array indexes. Leading dimensions of array must match the dimensions of indexes.

rlpyt.utils.array.to_onehot(indexes, dim, dtype=None)

Converts integer values in multi-dimensional array indexes to one-hot values of size dim; expanded in an additional trailing dimension.

rlpyt.utils.array.valid_mean(array, valid=None, axis=None)

Mean of array, accounting for optional mask valid, optionally along an axis.

rlpyt.utils.array.infer_leading_dims(array, dim)

Determine any leading dimensions of array, which can have up to two leading dimensions more than the number of data dimensions, dim. Used to check for [B] or [T,B] leading. Returns size of leading dimensions (or 1 if they don’t exist), the data shape, and whether the leading dimensions where found.

Tensor

Miscellaneous functions for manipulating torch tensors.

rlpyt.utils.tensor.select_at_indexes(indexes, tensor)

Returns the contents of tensor at the multi-dimensional integer array indexes. Leading dimensions of tensor must match the dimensions of indexes.

rlpyt.utils.tensor.to_onehot(indexes, num, dtype=None)

Converts integer values in multi-dimensional tensor indexes to one-hot values of size num; expanded in an additional trailing dimension.

rlpyt.utils.tensor.from_onehot(onehot, dim=-1, dtype=None)

Argmax over trailing dimension of tensor onehot. Optional return dtype specification.

rlpyt.utils.tensor.valid_mean(tensor, valid=None, dim=None)

Mean of tensor, accounting for optional mask valid, optionally along a dimension.

rlpyt.utils.tensor.infer_leading_dims(tensor, dim)

Looks for up to two leading dimensions in tensor, before the data dimensions, of which there are assumed to be dim number. For use at beginning of model’s forward() method, which should finish with restore_leading_dims() (see that function for help.) Returns: lead_dim: int –number of leading dims found. T: int –size of first leading dim, if two leading dims, o/w 1. B: int –size of first leading dim if one, second leading dim if two, o/w 1. shape: tensor shape after leading dims.

rlpyt.utils.tensor.restore_leading_dims(tensors, lead_dim, T=1, B=1)

Reshapes tensors (one or tuple, list) to to have lead_dim leading dimensions, which will become [], [B], or [T,B]. Assumes input tensors already have a leading Batch dimension, which might need to be removed. (Typically the last layer of model will compute with leading batch dimension.) For use in model forward() method, so that output dimensions match input dimensions, and the same model can be used for any such case. Use with outputs from infer_leading_dims().

Miscellaneous Array / Tensor

rlpyt.utils.misc.iterate_mb_idxs(data_length, minibatch_size, shuffle=False)

Yields minibatches of indexes, to use as a for-loop iterator, with option to shuffle.

rlpyt.utils.misc.zeros(shape, dtype)

Attempt to return torch tensor of zeros, or if numpy dtype provided, return numpy array or zeros.

rlpyt.utils.misc.empty(shape, dtype)

Attempt to return empty torch tensor, or if numpy dtype provided, return empty numpy array.

rlpyt.utils.misc.extract_sequences(array_or_tensor, T_idxs, B_idxs, T)

Assumes array_or_tensor has [T,B] leading dims. Returns new array/tensor which contains sequences of length [T] taken from the starting indexes [T_idxs, B_idxs], where T_idxs (and B_idxs) is a list or vector of integers. Handles wrapping automatically. (Return shape: [T, len(B_idxs),…]).

Collections

(see Named Array Tuple page)

class rlpyt.utils.collections.AttrDict(*args, **kwargs)

Bases: dict

Behaves like a dictionary but additionally has attribute-style access for both read and write. e.g. x[“key”] and x.key are the same, e.g. can iterate using: for k, v in x.items(). Can sublcass for specific data classes; must call AttrDict’s __init__().

copy()

Provides a “deep” copy of all unbroken chains of types AttrDict, but shallow copies otherwise, (e.g. numpy arrays are NOT copied).

Buffers

rlpyt.utils.buffer.buffer_from_example(example, leading_dims, share_memory=False, use_NatSchema=None)

Allocates memory and returns it in namedarraytuple with same structure as examples, which should be a namedtuple or namedarraytuple. Applies the same leading dimensions leading_dims to every entry, and otherwise matches their shapes and dtypes. The examples should have no leading dimensions. None fields will stay None. Optionally allocate on OS shared memory. Uses build_array().

New: can use NamedArrayTuple types by the use_NatSchema flag, which may be easier for pickling/unpickling when using spawn instead of fork. If use_NatSchema is None, the type of example will be used to infer what type to return (this is the default)

rlpyt.utils.buffer.build_array(example, leading_dims, share_memory=False)

Allocate a numpy array matchin the dtype and shape of example, possibly with additional leading dimensions. Optionally allocate on OS shared memory.

rlpyt.utils.buffer.np_mp_array(shape, dtype)

Allocate a numpy array on OS shared memory.

rlpyt.utils.buffer.torchify_buffer(buffer_)

Convert contents of buffer_ from numpy arrays to torch tensors. buffer_ can be an arbitrary structure of tuples, namedtuples, namedarraytuples, NamedTuples, and NamedArrayTuples, and a new, matching structure will be returned. None fields remain None, and torch tensors are left alone.

rlpyt.utils.buffer.numpify_buffer(buffer_)

Convert contents of buffer_ from torch tensors to numpy arrays. buffer_ can be an arbitrary structure of tuples, namedtuples, namedarraytuples, NamedTuples, and NamedArrayTuples, and a new, matching structure will be returned. None fields remain None, and numpy arrays are left alone.

rlpyt.utils.buffer.buffer_to(buffer_, device=None)

Send contents of buffer_ to specified device (contents must be torch tensors.). buffer_ can be an arbitrary structure of tuples, namedtuples, namedarraytuples, NamedTuples and NamedArrayTuples, and a new, matching structure will be returned.

rlpyt.utils.buffer.buffer_method(buffer_, method_name, *args, **kwargs)

Call method method_name(*args, **kwargs) on all contents of buffer_, and return the results. buffer_ can be an arbitrary structure of tuples, namedtuples, namedarraytuples, NamedTuples, and NamedArrayTuples, and a new, matching structure will be returned. None fields remain None.

rlpyt.utils.buffer.buffer_func(buffer_, func, *args, **kwargs)

Call function func(buf, *args, **kwargs) on all contents of buffer_, and return the results. buffer_ can be an arbitrary structure of tuples, namedtuples, namedarraytuples, NamedTuples, and NamedArrayTuples, and a new, matching structure will be returned. None fields remain None.

rlpyt.utils.buffer.get_leading_dims(buffer_, n_dim=1)

Return the n_dim number of leading dimensions of the contents of buffer_. Checks to make sure the leading dimensions match for all tensors/arrays, except ignores None fields.

Algorithms

rlpyt.algos.utils.discount_return(reward, done, bootstrap_value, discount, return_dest=None)

Time-major inputs, optional other dimensions: [T], [T,B], etc. Computes discounted sum of future rewards from each time-step to the end of the batch, including bootstrapping value. Sum resets where done is 1. Optionally, writes to buffer return_dest, if provided. Operations vectorized across all trailing dimensions after the first [T,].

rlpyt.algos.utils.generalized_advantage_estimation(reward, value, done, bootstrap_value, discount, gae_lambda, advantage_dest=None, return_dest=None)

Time-major inputs, optional other dimensions: [T], [T,B], etc. Similar to discount_return() but using Generalized Advantage Estimation to compute advantages and returns.

rlpyt.algos.utils.discount_return_n_step(reward, done, n_step, discount, return_dest=None, done_n_dest=None, do_truncated=False)

Time-major inputs, optional other dimension: [T], [T,B], etc. Computes n-step discounted returns within the timeframe of the of given rewards. If do_truncated==False, then only compute at time-steps with full n-step future rewards are provided (i.e. not at last n-steps–output shape will change!). Returns n-step returns as well as n-step done signals, which is True if done=True at any future time before the n-step target bootstrap would apply (bootstrap in the algo, not here).

rlpyt.algos.utils.valid_from_done(done)

Returns a float mask which is zero for all time-steps after a done=True is signaled. This function operates on the leading dimension of done, assumed to correspond to time [T,…], other dimensions are preserved.

rlpyt.algos.utils.discount_return_tl(reward, done, bootstrap_value, discount, timeout, value, return_dest=None)

Like discount_return(), above, except uses bootstrapping where ‘done’ is due to env horizon time-limit (tl=Time-Limit). (In the algo, should not train on samples where timeout=True.)

rlpyt.algos.utils.generalized_advantage_estimation_tl(reward, value, done, bootstrap_value, discount, gae_lambda, timeout, advantage_dest=None, return_dest=None)

Like generalized_advantage_estimation(), above, except uses bootstrapping where ‘done’ is due to env horizon time-limit (tl=Time-Limit). (In the algo, should not train on samples where timeout=True.)

Synchronize

class rlpyt.utils.synchronize.RWLock

Multiple simultaneous readers, one writer.

rlpyt.utils.synchronize.drain_queue(queue_obj, n_sentinel=0, guard_sentinel=False)

Empty a multiprocessing queue object, with options to protect against the delay between queue.put() and queue.get(). Returns a list of the queue contents.

With n_sentinel=0, simply call queue.get(block=False) until queue.Empty exception (which can still happen slightly after another process called queue.put()).

With n_sentinel>1, call queue.get() until n_sentinel None objects have been returned (marking that each put() process has finished).

With guard_sentinel=True (need n_sentinel=0), stops if a None is retrieved, and puts it back into the queue, so it can do a blocking drain later with n_sentinel>1.

rlpyt.utils.synchronize.find_port(offset)

Find a unique open port, for initializing torch.distributed in multiple separate multi-GPU runs on one machine.

Quick Arguments

rlpyt.utils.quick_args.save__init__args(values, underscore=False, overwrite=False, subclass_only=False)

Use in __init__() only; assign all args/kwargs to instance attributes. To maintain precedence of args provided to subclasses, call this in the subclass before super().__init__() if save__init__args() also appears in base class, or use overwrite=True. With subclass_only==True, only args/kwargs listed in current subclass apply.

Progress Bar

class rlpyt.utils.prog_bar.ProgBarCounter(total_count)

Dynamic display of progress bar in terminal, for example to mark progress (and estimate time to completion) of RL iterations toward the next logging update. credit: rllab.

Seed

rlpyt.utils.seed.set_seed(seed)

Sets random.seed, np.random.seed, torch.manual_seed, torch.cuda.manual_seed.

rlpyt.utils.seed.make_seed()

Returns a random number between [0, 10000], using timing jitter.

This has a white noise spectrum and gives unique values for multiple simultaneous processes…some simpler attempts did not achieve that, but there’s probably a better way.