FPPOTrainerTrainingSettings | Unreal Engine 5.1 Documentation

API > API/Plugins > API/Plugins/LearningTraining

References


Module	LearningTraining
Header	/Engine/Plugins/Experimental/LearningAgents/Source/LearningTraining/Public/LearningPPOTrainer.h
Include	#include "LearningPPOTrainer.h"

Syntax

struct FPPOTrainerTrainingSettings

Remarks

Settings used for training with PPO

Variables

Type	Name	Description
float	ActionRegularizationWeight	Weight used to regularize actions.
bool	bAdvantageNormalization	When true, advantages are normalized.
uint32	BatchSize	Batch size to use for training.
bool	bClipAdvantages	When true, very large or small advantages will be clipped.
bool	bUseTensorboard	If to use TensorBoard for logging and tracking the training progress.
ETrainerDevice	Device	Which device to use for training.
float	DiscountFactor	The discount factor causes future rewards to be scaled down so that the policy will favor near-term rewards over potentially uncertain long-term rewards.
float	EntropyWeight	Weighting used for the entropy bonus.
float	EpsilonClip	Clipping ratio to apply to policy updates.
float	GaeLambda	This is used in the Generalized Advantage Estimation as what is essentially an exponential smoothing/decay.
float	InitialActionScale	Initial scale to apply to actions before noise is added to them.
uint32	IterationNum	Number of iterations to train the network for.
float	LearningRateCritic	Learning rate of the critic network.
float	LearningRateDecay	Ratio by which to decay the learning rate every 1000 iterations.
float	LearningRatePolicy	Learning rate of the policy network. Typical values are between 0.001f and 0.0001f.
uint32	Seed	Random Seed to use for training.
int32	TrimEpisodeEndStepNum	Number of steps to trim from the end of each episode during training.
int32	TrimEpisodeStartStepNum	Number of steps to trim from the start of each episode during training.
float	WeightDecay	Amount of weight decay to apply to the network.

Navigation

References

Syntax

Remarks

Variables