PGPE Optimizer¶

#include <abstractions/pgpe.h>
using namespace abstractions;

class PgpeOptimizer¶

Optimize a function using Policy Gradients with Parameter-based Exploration (PGPE).

This implementation includes the ClipUp extension to PGPE. The full algorithm details are available at the ClipUp project site. It is a black box-style optimizer that uses local sampling to estimate parameter updates. Each call to the optimizer will gradually move the solution towards a local optima.

The user of this class is responsible for two things:

Maintaining the storage for the samples drawn by the optimizer.
Providing a way to determine the “fitness” of each drawn sample.

This will generally look something like

RowVector solution = InitialGuess();
Matrix samples = Allocate();

auto optimizer = PgpeOptimizer::New(settings);
optimizer.Initialize(solution);

while (!converged) {
    optimizer.Sample(samples);
    ColumnVector costs = EstimateCosts(samples);
    optimizer.Update(samples, costs);
}

Public Functions

PgpeOptimizer(const PgpeOptimizer &other) = default¶

Create an optimizer from another one.

Parameters:: other – other optimizer

Expected<RowVector> GetEstimate() const¶

Get the current estimate of the best parameter vector from the optimizer.

Returns:: A row vector with the current estimate.

Expected<RowVector> GetSolutionStdDev() const¶

Get the current estimate of the solutions standard deviation.

Returns:: A row vector storing the per-parameter standard deviations.

Expected<RowVector> GetSolutionVelocity() const¶

Get the currently estimated optimizer velocity.

The velocity is a vector pointing along the currently estimated gradient but with a magnitude defined by the PgpeOptimizerSettings::max_speed option.

Returns:: A row vector with the current solution velocity.

const PgpeOptimizerSettings &GetSettings() const¶: Get the settings used for this optimizer.

void SetPrngSeed(DefaultRngType::result_type seed)¶

Replace the internl PRNG with a new with the provided seed.

This is mainly for when the optimizer is being used as part of a larger system. This allows the internal PRNG to be configured with a new seed post-initialization. This has the effect of also resetting the PRNG.

Parameters:: seed – new PRNG seed

void Initialize(ConstRowVectorRef x_init, std::optional<double> init_stddev = {})¶

Initialize the optimizer to some starting state x_init.

Calling this has the effect of resetting the optimizer. All of the internal state variables will be randomly initialized, regardless of whether or not the optimizer has ran at any point before.

The initial standard deviation will be automatically calculated from the dimensionality of the input vector. This can be overridden by providing a value to init_stddev.

Parameters:

x_init – The initial state (parameters) vector.
init_stddev – The initial solution standard deviation.

void Initialize(int num_dim, double init_stddev = 0.1)¶

Initialize the optimizer.

Calling this has the effect of resetting the optimizer. All of the internal state variables will be randomly initialized, regardless of whether or not the optimizer has ran at any point before.

The solution will be initially set to ‘0’ while the initial standard deviation will be set to init_stddev.

Parameters:

num_dim – The dimensionality of the solution space.
init_stddev – Initial standard deviation of the solution.

void RankLinearize(ColumnVectorRef costs) const¶

Linearizes the costs so they are equally distributed on [-0.5, 0.5].

Note

The costs are modified in-place.

Parameters:: costs – [inout] per-sample costs

Error Sample(MatrixRef samples)¶

Sample parameters from the current optimizer state.

The optimizer stores parameters as row vectors, so the number of drawn samples will be equal to the number of rows in the provided matrix. The number of columns must match the length of the vector that was passed into PgpeOptimizer::Initialize().

Parameters:: samples – A reference to the matrix that will store the drawn samples.
Returns:: An error if the samples could not be drawn.

Error Update(ConstMatrixRef samples, ConstColumnVectorRef costs)¶

Update the optimizer’s internal state based on the reported sample costs.

The optimizer knows nothing about the problem its being asked to solve. Rather, it has a strategy for exploring a solution space and finding the most optimal one. The caller is responsible for calculating the correctness of each solution.

Parameters:

samples – A set of state vector samples. This has the same format as the input to PgpeOptimizer::Samples().
costs – A column vector, where each element is the relative cost of that particular solution.

Returns:

An error if the update failed.

Public Static Functions

static Expected<PgpeOptimizer> New(const PgpeOptimizerSettings &settings)¶

Create a new optimizer with the given settings.

Parameters:: settings – optimizer settings
Returns:: The configured optimizer or an Error instance if the creation failed.

struct PgpeOptimizerSettings¶

Runtime settings for the PgpeOptimizer.

Public Functions

Error Validate() const¶

Validate the optimizer settings.

Returns:: If the settings are invalid, then it will return the reason why they are invalid.

Public Members

double max_speed = std::numeric_limits<double>::signaling_NaN()¶

The largest possible magnitude of a parameter update vector.

This value must be set as it’s dependent on the problem that the optimizer is being asked to solve. It constrains the largest possible update that the optimizer can make.

double init_search_radius = 15¶: The initial distribution search radius.

double momentum = 0.9¶

Momentum used in gradient updates.

The ClipUp algorithm uses momentum to preserve the relative direction of the state update vector (i.e., velocity).

double stddev_learning_rate = 0.1¶: Learning rate used to estimating the solution standard deviation.

double stddev_max_change = 0.2¶: The maximum allowable change between standard deviation updates.

std::optional<uint32_t> seed = {}¶: The seed used by the optimizer’s internal RNG. Will be generated from a random source if not provided.