Distributions¶
Distributions are used to select randomized actions during sampling, and for some algorithms to compute likelihood and related values for training. Typically, the distribution is owned by the agent. This page documents the implemented distributions and some methods–see the code for details.
-
class
rlpyt.distributions.base.Distribution¶ Base distribution class. Not all subclasses will impelement all methods.
-
sample(dist_info)¶ Generate random sample(s) from distribution informations.
-
kl(old_dist_info, new_dist_info)¶ Compute the KL divergence of two distributions at each datum; should maintain leading dimensions (e.g. [T,B]).
-
mean_kl(old_dist_info, new_dist_info, valid)¶ Compute the mean KL divergence over a data batch, possible ignoring data marked as invalid.
-
log_likelihood(x, dist_info)¶ Compute log-likelihood of samples
xat distributions described indist_info(i.e. can have same leading dimensions [T, B]).
-
likelihood_ratio(x, old_dist_info, new_dist_info)¶ Compute likelihood ratio of samples
xat new distributions over old distributions (usuallynew_dist_infois variable for differentiation); should maintain leading dimensions.
-
entropy(dist_info)¶ Compute entropy of distributions contained in
dist_info; should maintain any leading dimensions.
-
perplexity(dist_info)¶ Exponential of the entropy, maybe useful for logging.
-
mean_entropy(dist_info, valid=None)¶ In case some sophisticated mean is needed (e.g. internally ignoring select parts of action space), can override.
-
mean_perplexity(dist_info, valid=None)¶ Exponential of the entropy, maybe useful for logging.
-
-
class
rlpyt.distributions.discrete.DiscreteMixin(dim, dtype=<sphinx.ext.autodoc.importer._MockObject object>, onehot_dtype=<sphinx.ext.autodoc.importer._MockObject object>)¶ Conversions to and from one-hot.
-
to_onehot(indexes, dtype=None)¶ Convert from integer indexes to one-hot, preserving leading dimensions.
-
from_onehot(onehot, dtype=None)¶ Convert from one-hot to integer indexes, preserving leading dimensions.
-
-
class
rlpyt.distributions.categorical.Categorical(dim, dtype=<sphinx.ext.autodoc.importer._MockObject object>, onehot_dtype=<sphinx.ext.autodoc.importer._MockObject object>)¶ Bases:
rlpyt.distributions.discrete.DiscreteMixin,rlpyt.distributions.base.DistributionMultinomial distribution over a discrete domain.
-
sample(dist_info)¶ Sample from
torch.multiomialover trailing dimension ofdist_info.prob.
-
-
class
rlpyt.distributions.epsilon_greedy.EpsilonGreedy(epsilon=1, **kwargs)¶ Bases:
rlpyt.distributions.discrete.DiscreteMixin,rlpyt.distributions.base.DistributionFor epsilon-greedy exploration from state-action Q-values.
-
sample(q)¶ Input can be shaped [T,B,Q] or [B,Q], and vector epsilon of length B will apply across the Batch dimension (same epsilon for all T).
-
set_epsilon(epsilon)¶ Assign value for epsilon (can be vector).
-
-
class
rlpyt.distributions.epsilon_greedy.CategoricalEpsilonGreedy(z=None, **kwargs)¶ Bases:
rlpyt.distributions.epsilon_greedy.EpsilonGreedyFor epsilon-greedy exploration from distributional (categorical) representation of state-action Q-values.
-
sample(p, z=None)¶ Input p to be shaped [T,B,A,P] or [B,A,P], A: number of actions, P: number of atoms. Optional input z is domain of atom-values, shaped [P]. Vector epsilon of lenght B will apply across Batch dimension.
-
set_z(z)¶ Assign vector of bin locations, distributional domain.
-
-
class
rlpyt.distributions.gaussian.Gaussian(dim, std=None, clip=None, noise_clip=None, min_std=None, max_std=None, squash=None)¶ Multivariate Gaussian with independent variables (diagonal covariance). Standard deviation can be provided, as scalar or value per dimension, or it will be drawn from the dist_info (possibly learnable), where it is expected to have a value per each dimension. Noise clipping or sample clipping optional during sampling, but not accounted for in formulas (e.g. entropy). Clipping of standard deviation optional and accounted in formulas. Squashing of samples to squash * tanh(sample) is optional and accounted for in log_likelihood formula but not entropy.
-
entropy(dist_info)¶ Uses
self.stdunless that is None, then will get log_std from dist_info. Not implemented for squashing.
-
log_likelihood(x, dist_info)¶ Uses
self.stdunless that is None, then uses log_std from dist_info. When squashing: instead of numerically risky arctanh, assume param ‘x’ is pre-squash action, seesample_loglikelihood()below.
-
sample_loglikelihood(dist_info)¶ Special method for use with SAC algorithm, which returns a new sampled action and its log-likelihood for training use. Temporarily turns OFF squashing, so that log_likelihood can be computed on non-squashed sample, and then restores squashing and applies it to the sample before output.
-
sample(dist_info)¶ Generate random samples using
torch.normal, fromdist_info.mean. Usesself.stdunless it isNone, then usesdist_info.log_std.
-
set_clip(clip)¶ Input value or
Noneto turn OFF.
-
set_squash(squash)¶ Input multiplicative factor for
squash * tanh(sample)(usually will be 1), orNoneto turn OFF.
-
set_noise_clip(noise_clip)¶ Input value or
Noneto turn OFF.
-
set_std(std)¶ Input value, which can be same shape as action space, or else broadcastable up to that shape, or
Noneto turn OFF and usedist_info.log_stdin other methods.
-