FPPOTrainerTrainingSettings | Unreal Engine 5.5 Documentation

API > API/Plugins > API/Plugins/LearningTraining

References


Module	LearningTraining
Header	/Engine/Plugins/Experimental/LearningAgents/Source/LearningTraining/Public/LearningPPOTrainer.h
Include	#include "LearningPPOTrainer.h"

Syntax

struct FPPOTrainerTrainingSettings

Remarks

Settings used for training with PPO

Variables

Type	Name	Description
float	ActionEntropyWeight	Weighting used for the entropy bonus.
float	ActionRegularizationWeight	Weight used to regularize actions.Larger values will encourage exploration and smaller actions, but too large will cause noisy actions centered around zero.
float	ActionSurrogateWeight	Weight for the loss used to train the policy via the PPO surrogate objective.
float	AdvantageMax	The maximum advantage to allow.
float	AdvantageMin	The minimum advantage to allow.
bool	bAdvantageNormalization	When true, advantages are normalized.
bool	bSaveSnapshots	If to save snapshots of the trained networks every 1000 iterations.
bool	bUseGradNormMaxClipping	If true, uses gradient norm max clipping. Set this as True if training is unstable or leave as False if unused.
bool	bUseTensorboard	If to use TensorBoard for logging and tracking the training progress.
uint32	CriticBatchSize	Batch size to use for training the critic. Large batch sizes are much more computationally efficient when training on the GPU.
uint32	CriticWarmupIterations	Number of iterations of training to perform to warm-up the Critic.
ETrainerDevice	Device	Which device to use for training.
float	DiscountFactor	The discount factor causes future rewards to be scaled down so that the policy will favor near-term rewards over potentially uncertain long-term rewards.
float	EpsilonClip	Clipping ratio to apply to policy updates.
float	GaeLambda	This is used in the Generalized Advantage Estimation, where larger values will tend to assign more credit to recent actions.
float	GradNormMax	The maximum gradient norm to clip updates to.
uint32	IterationNum	Number of iterations to train the network for.
uint32	IterationsPerGather	Number of training iterations to perform per buffer of experience gathered.
float	LearningRateCritic	Learning rate of the critic network.
float	LearningRateDecay	Amount by which to multiply the learning rate every 1000 iterations.
float	LearningRatePolicy	Learning rate of the policy network. Typical values are between 0.001f and 0.0001f.
uint32	PolicyBatchSize	Batch size to use for training the policy. Large batch sizes are much more computationally efficient when training on the GPU.
uint32	PolicyWindow	The number of consecutive steps of observations and actions over which to train the policy.
float	ReturnRegularizationWeight	Weight used to regularize predicted returns. Encourages the critic not to over or under estimate returns.
uint32	Seed	Random Seed to use for training.
int32	TrimEpisodeEndStepNum	Number of steps to trim from the end of each episode during training.
int32	TrimEpisodeStartStepNum	Number of steps to trim from the start of each episode during training.
float	WeightDecay	Amount of weight decay to apply to the network.

Navigation

References

Syntax

Remarks

Variables