Utilities¶
Here are listed number of miscellaneous utilities used in rlpyt.
Array¶
Miscellaneous functions for manipulating numpy arrays.
-
rlpyt.utils.array.
select_at_indexes
(indexes, array)¶ Returns the contents of
array
at the multi-dimensional integer arrayindexes
. Leading dimensions ofarray
must match the dimensions ofindexes
.
-
rlpyt.utils.array.
to_onehot
(indexes, dim, dtype=None)¶ Converts integer values in multi-dimensional array
indexes
to one-hot values of sizedim
; expanded in an additional trailing dimension.
-
rlpyt.utils.array.
valid_mean
(array, valid=None, axis=None)¶ Mean of
array
, accounting for optional maskvalid
, optionally along an axis.
-
rlpyt.utils.array.
infer_leading_dims
(array, dim)¶ Determine any leading dimensions of
array
, which can have up to two leading dimensions more than the number of data dimensions,dim
. Used to check for [B] or [T,B] leading. Returns size of leading dimensions (or 1 if they don’t exist), the data shape, and whether the leading dimensions where found.
Tensor¶
Miscellaneous functions for manipulating torch tensors.
-
rlpyt.utils.tensor.
select_at_indexes
(indexes, tensor)¶ Returns the contents of
tensor
at the multi-dimensional integer arrayindexes
. Leading dimensions oftensor
must match the dimensions ofindexes
.
-
rlpyt.utils.tensor.
to_onehot
(indexes, num, dtype=None)¶ Converts integer values in multi-dimensional tensor
indexes
to one-hot values of sizenum
; expanded in an additional trailing dimension.
-
rlpyt.utils.tensor.
from_onehot
(onehot, dim=-1, dtype=None)¶ Argmax over trailing dimension of tensor
onehot
. Optional return dtype specification.
-
rlpyt.utils.tensor.
valid_mean
(tensor, valid=None, dim=None)¶ Mean of
tensor
, accounting for optional maskvalid
, optionally along a dimension.
-
rlpyt.utils.tensor.
infer_leading_dims
(tensor, dim)¶ Looks for up to two leading dimensions in
tensor
, before the data dimensions, of which there are assumed to bedim
number. For use at beginning of model’sforward()
method, which should finish withrestore_leading_dims()
(see that function for help.) Returns: lead_dim: int –number of leading dims found. T: int –size of first leading dim, if two leading dims, o/w 1. B: int –size of first leading dim if one, second leading dim if two, o/w 1. shape: tensor shape after leading dims.
-
rlpyt.utils.tensor.
restore_leading_dims
(tensors, lead_dim, T=1, B=1)¶ Reshapes
tensors
(one or tuple, list) to to havelead_dim
leading dimensions, which will become [], [B], or [T,B]. Assumes input tensors already have a leading Batch dimension, which might need to be removed. (Typically the last layer of model will compute with leading batch dimension.) For use in modelforward()
method, so that output dimensions match input dimensions, and the same model can be used for any such case. Use with outputs frominfer_leading_dims()
.
Miscellaneous Array / Tensor¶
-
rlpyt.utils.misc.
iterate_mb_idxs
(data_length, minibatch_size, shuffle=False)¶ Yields minibatches of indexes, to use as a for-loop iterator, with option to shuffle.
-
rlpyt.utils.misc.
zeros
(shape, dtype)¶ Attempt to return torch tensor of zeros, or if numpy dtype provided, return numpy array or zeros.
-
rlpyt.utils.misc.
empty
(shape, dtype)¶ Attempt to return empty torch tensor, or if numpy dtype provided, return empty numpy array.
-
rlpyt.utils.misc.
extract_sequences
(array_or_tensor, T_idxs, B_idxs, T)¶ Assumes array_or_tensor has [T,B] leading dims. Returns new array/tensor which contains sequences of length [T] taken from the starting indexes [T_idxs, B_idxs], where T_idxs (and B_idxs) is a list or vector of integers. Handles wrapping automatically. (Return shape: [T, len(B_idxs),…]).
Collections¶
(see Named Array Tuple page)
-
class
rlpyt.utils.collections.
AttrDict
(*args, **kwargs)¶ Bases:
dict
Behaves like a dictionary but additionally has attribute-style access for both read and write. e.g. x[“key”] and x.key are the same, e.g. can iterate using: for k, v in x.items(). Can sublcass for specific data classes; must call AttrDict’s __init__().
-
copy
()¶ Provides a “deep” copy of all unbroken chains of types AttrDict, but shallow copies otherwise, (e.g. numpy arrays are NOT copied).
-
Buffers¶
-
rlpyt.utils.buffer.
buffer_from_example
(example, leading_dims, share_memory=False, use_NatSchema=None)¶ Allocates memory and returns it in namedarraytuple with same structure as
examples
, which should be a namedtuple or namedarraytuple. Applies the same leading dimensionsleading_dims
to every entry, and otherwise matches their shapes and dtypes. The examples should have no leading dimensions.None
fields will stayNone
. Optionally allocate on OS shared memory. Usesbuild_array()
.New: can use NamedArrayTuple types by the use_NatSchema flag, which may be easier for pickling/unpickling when using spawn instead of fork. If use_NatSchema is None, the type of
example
will be used to infer what type to return (this is the default)
-
rlpyt.utils.buffer.
build_array
(example, leading_dims, share_memory=False)¶ Allocate a numpy array matchin the dtype and shape of example, possibly with additional leading dimensions. Optionally allocate on OS shared memory.
-
rlpyt.utils.buffer.
np_mp_array
(shape, dtype)¶ Allocate a numpy array on OS shared memory.
-
rlpyt.utils.buffer.
torchify_buffer
(buffer_)¶ Convert contents of
buffer_
from numpy arrays to torch tensors.buffer_
can be an arbitrary structure of tuples, namedtuples, namedarraytuples, NamedTuples, and NamedArrayTuples, and a new, matching structure will be returned.None
fields remainNone
, and torch tensors are left alone.
-
rlpyt.utils.buffer.
numpify_buffer
(buffer_)¶ Convert contents of
buffer_
from torch tensors to numpy arrays.buffer_
can be an arbitrary structure of tuples, namedtuples, namedarraytuples, NamedTuples, and NamedArrayTuples, and a new, matching structure will be returned.None
fields remainNone
, and numpy arrays are left alone.
-
rlpyt.utils.buffer.
buffer_to
(buffer_, device=None)¶ Send contents of
buffer_
to specified device (contents must be torch tensors.).buffer_
can be an arbitrary structure of tuples, namedtuples, namedarraytuples, NamedTuples and NamedArrayTuples, and a new, matching structure will be returned.
-
rlpyt.utils.buffer.
buffer_method
(buffer_, method_name, *args, **kwargs)¶ Call method
method_name(*args, **kwargs)
on all contents ofbuffer_
, and return the results.buffer_
can be an arbitrary structure of tuples, namedtuples, namedarraytuples, NamedTuples, and NamedArrayTuples, and a new, matching structure will be returned.None
fields remainNone
.
-
rlpyt.utils.buffer.
buffer_func
(buffer_, func, *args, **kwargs)¶ Call function
func(buf, *args, **kwargs)
on all contents ofbuffer_
, and return the results.buffer_
can be an arbitrary structure of tuples, namedtuples, namedarraytuples, NamedTuples, and NamedArrayTuples, and a new, matching structure will be returned.None
fields remainNone
.
-
rlpyt.utils.buffer.
get_leading_dims
(buffer_, n_dim=1)¶ Return the
n_dim
number of leading dimensions of the contents ofbuffer_
. Checks to make sure the leading dimensions match for all tensors/arrays, except ignoresNone
fields.
Algorithms¶
-
rlpyt.algos.utils.
discount_return
(reward, done, bootstrap_value, discount, return_dest=None)¶ Time-major inputs, optional other dimensions: [T], [T,B], etc. Computes discounted sum of future rewards from each time-step to the end of the batch, including bootstrapping value. Sum resets where done is 1. Optionally, writes to buffer return_dest, if provided. Operations vectorized across all trailing dimensions after the first [T,].
-
rlpyt.algos.utils.
generalized_advantage_estimation
(reward, value, done, bootstrap_value, discount, gae_lambda, advantage_dest=None, return_dest=None)¶ Time-major inputs, optional other dimensions: [T], [T,B], etc. Similar to discount_return() but using Generalized Advantage Estimation to compute advantages and returns.
-
rlpyt.algos.utils.
discount_return_n_step
(reward, done, n_step, discount, return_dest=None, done_n_dest=None, do_truncated=False)¶ Time-major inputs, optional other dimension: [T], [T,B], etc. Computes n-step discounted returns within the timeframe of the of given rewards. If do_truncated==False, then only compute at time-steps with full n-step future rewards are provided (i.e. not at last n-steps–output shape will change!). Returns n-step returns as well as n-step done signals, which is True if done=True at any future time before the n-step target bootstrap would apply (bootstrap in the algo, not here).
-
rlpyt.algos.utils.
valid_from_done
(done)¶ Returns a float mask which is zero for all time-steps after a done=True is signaled. This function operates on the leading dimension of done, assumed to correspond to time [T,…], other dimensions are preserved.
-
rlpyt.algos.utils.
discount_return_tl
(reward, done, bootstrap_value, discount, timeout, value, return_dest=None)¶ Like discount_return(), above, except uses bootstrapping where ‘done’ is due to env horizon time-limit (tl=Time-Limit). (In the algo, should not train on samples where timeout=True.)
-
rlpyt.algos.utils.
generalized_advantage_estimation_tl
(reward, value, done, bootstrap_value, discount, gae_lambda, timeout, advantage_dest=None, return_dest=None)¶ Like generalized_advantage_estimation(), above, except uses bootstrapping where ‘done’ is due to env horizon time-limit (tl=Time-Limit). (In the algo, should not train on samples where timeout=True.)
Synchronize¶
-
class
rlpyt.utils.synchronize.
RWLock
¶ Multiple simultaneous readers, one writer.
-
rlpyt.utils.synchronize.
drain_queue
(queue_obj, n_sentinel=0, guard_sentinel=False)¶ Empty a multiprocessing queue object, with options to protect against the delay between
queue.put()
andqueue.get()
. Returns a list of the queue contents.With
n_sentinel=0
, simply callqueue.get(block=False)
untilqueue.Empty
exception (which can still happen slightly after another process calledqueue.put()
).With
n_sentinel>1
, callqueue.get()
until n_sentinelNone
objects have been returned (marking that eachput()
process has finished).With
guard_sentinel=True
(needn_sentinel=0
), stops if aNone
is retrieved, and puts it back into the queue, so it can do a blocking drain later withn_sentinel>1
.
-
rlpyt.utils.synchronize.
find_port
(offset)¶ Find a unique open port, for initializing torch.distributed in multiple separate multi-GPU runs on one machine.
Quick Arguments¶
-
rlpyt.utils.quick_args.
save__init__args
(values, underscore=False, overwrite=False, subclass_only=False)¶ Use in __init__() only; assign all args/kwargs to instance attributes. To maintain precedence of args provided to subclasses, call this in the subclass before super().__init__() if save__init__args() also appears in base class, or use overwrite=True. With subclass_only==True, only args/kwargs listed in current subclass apply.
Progress Bar¶
-
class
rlpyt.utils.prog_bar.
ProgBarCounter
(total_count)¶ Dynamic display of progress bar in terminal, for example to mark progress (and estimate time to completion) of RL iterations toward the next logging update. credit: rllab.
Seed¶
-
rlpyt.utils.seed.
set_seed
(seed)¶ Sets random.seed, np.random.seed, torch.manual_seed, torch.cuda.manual_seed.
-
rlpyt.utils.seed.
make_seed
()¶ Returns a random number between [0, 10000], using timing jitter.
This has a white noise spectrum and gives unique values for multiple simultaneous processes…some simpler attempts did not achieve that, but there’s probably a better way.