FPPOTrainerTrainingSettings | Unreal Engine 5.7 Documentation

API > API/Plugins > API/Plugins/LearningTraining

Settings used for training with PPO


Name	FPPOTrainerTrainingSettings
Type	struct
Header File	/Engine/Plugins/Experimental/LearningAgents/Source/LearningTraining/Public/LearningPPOTrainer.h
Include Path	#include "LearningPPOTrainer.h"

Syntax

struct FPPOTrainerTrainingSettings

Variables

Public

Name	Type	Remarks	Include Path
ActionEntropyWeight	float	Weighting used for the entropy bonus.	LearningPPOTrainer.h
ActionRegularizationWeight	float	Weight used to regularize actions.Larger values will encourage exploration and smaller actions, but too large will cause noisy actions centered around zero.	LearningPPOTrainer.h
ActionSurrogateWeight	float	Weight for the loss used to train the policy via the PPO surrogate objective.	LearningPPOTrainer.h
AdvantageMax	float	The maximum advantage to allow.	LearningPPOTrainer.h
AdvantageMin	float	The minimum advantage to allow.	LearningPPOTrainer.h
bAdvantageNormalization	bool	When true, advantages are normalized.	LearningPPOTrainer.h
bSaveSnapshots	bool	If to save snapshots of the trained networks every 1000 iterations.	LearningPPOTrainer.h
bUseGradNormMaxClipping	bool	If true, uses gradient norm max clipping. Set this as True if training is unstable or leave as False if unused.	LearningPPOTrainer.h
bUseTensorboard	bool	If to use TensorBoard for logging and tracking the training progress.	LearningPPOTrainer.h
CriticBatchSize	uint32	Batch size to use for training the critic. Large batch sizes are much more computationally efficient when training on the GPU.	LearningPPOTrainer.h
CriticWarmupIterations	uint32	Number of iterations of training to perform to warm-up the Critic.	LearningPPOTrainer.h
Device	ETrainerDevice	Which device to use for training.	LearningPPOTrainer.h
DiscountFactor	float	The discount factor causes future rewards to be scaled down so that the policy will favor near-term rewards over potentially uncertain long-term rewards.	LearningPPOTrainer.h
EpsilonClip	float	Clipping ratio to apply to policy updates.	LearningPPOTrainer.h
GaeLambda	float	This is used in the Generalized Advantage Estimation, where larger values will tend to assign more credit to recent actions.	LearningPPOTrainer.h
GradNormMax	float	The maximum gradient norm to clip updates to.	LearningPPOTrainer.h
IterationNum	uint32	Number of iterations to train the network for.	LearningPPOTrainer.h
IterationsPerGather	uint32	Number of training iterations to perform per buffer of experience gathered.	LearningPPOTrainer.h
LearningRateCritic	float	Learning rate of the critic network.	LearningPPOTrainer.h
LearningRateDecay	float	Amount by which to multiply the learning rate every 1000 iterations.	LearningPPOTrainer.h
LearningRatePolicy	float	Learning rate of the policy network. Typical values are between 0.001f and 0.0001f.	LearningPPOTrainer.h
PolicyBatchSize	uint32	Batch size to use for training the policy. Large batch sizes are much more computationally efficient when training on the GPU.	LearningPPOTrainer.h
PolicyWindow	uint32	The number of consecutive steps of observations and actions over which to train the policy.	LearningPPOTrainer.h
ReturnRegularizationWeight	float	Weight used to regularize predicted returns. Encourages the critic not to over or under estimate returns.	LearningPPOTrainer.h
Seed	uint32	Random Seed to use for training.	LearningPPOTrainer.h
TrimEpisodeEndStepNum	int32	Number of steps to trim from the end of each episode during training.	LearningPPOTrainer.h
TrimEpisodeStartStepNum	int32	Number of steps to trim from the start of each episode during training.	LearningPPOTrainer.h
WeightDecay	float	Amount of weight decay to apply to the network.	LearningPPOTrainer.h

Navigation

Syntax

Variables

Public