Navigation
API > API/Plugins > API/Plugins/LearningTraining
Settings used for training with PPO
| Name | FPPOTrainerTrainingSettings |
| Type | struct |
| Header File | /Engine/Plugins/Experimental/LearningAgents/Source/LearningTraining/Public/LearningPPOTrainer.h |
| Include Path | #include "LearningPPOTrainer.h" |
Syntax
struct FPPOTrainerTrainingSettings
Variables
Public
| Name | Type | Remarks | Include Path | Unreal Specifiers |
|---|---|---|---|---|
| ActionEntropyWeight | float | Weighting used for the entropy bonus. | LearningPPOTrainer.h | |
| ActionRegularizationWeight | float | Weight used to regularize actions.Larger values will encourage exploration and smaller actions, but too large will cause noisy actions centered around zero. | LearningPPOTrainer.h | |
| ActionSurrogateWeight | float | Weight for the loss used to train the policy via the PPO surrogate objective. | LearningPPOTrainer.h | |
| AdvantageMax | float | The maximum advantage to allow. | LearningPPOTrainer.h | |
| AdvantageMin | float | The minimum advantage to allow. | LearningPPOTrainer.h | |
| bAdvantageNormalization | bool | When true, advantages are normalized. | LearningPPOTrainer.h | |
| bSaveSnapshots | bool | If to save snapshots of the trained networks every 1000 iterations. | LearningPPOTrainer.h | |
| bUseGradNormMaxClipping | bool | If true, uses gradient norm max clipping. Set this as True if training is unstable or leave as False if unused. | LearningPPOTrainer.h | |
| bUseTensorboard | bool | If to use TensorBoard for logging and tracking the training progress. | LearningPPOTrainer.h | |
| CriticBatchSize | uint32 | Batch size to use for training the critic. Large batch sizes are much more computationally efficient when training on the GPU. | LearningPPOTrainer.h | |
| CriticWarmupIterations | uint32 | Number of iterations of training to perform to warm-up the Critic. | LearningPPOTrainer.h | |
| Device | ETrainerDevice | Which device to use for training. | LearningPPOTrainer.h | |
| DiscountFactor | float | The discount factor causes future rewards to be scaled down so that the policy will favor near-term rewards over potentially uncertain long-term rewards. | LearningPPOTrainer.h | |
| EpsilonClip | float | Clipping ratio to apply to policy updates. | LearningPPOTrainer.h | |
| GaeLambda | float | This is used in the Generalized Advantage Estimation, where larger values will tend to assign more credit to recent actions. | LearningPPOTrainer.h | |
| GradNormMax | float | The maximum gradient norm to clip updates to. | LearningPPOTrainer.h | |
| IterationNum | uint32 | Number of iterations to train the network for. | LearningPPOTrainer.h | |
| IterationsPerGather | uint32 | Number of training iterations to perform per buffer of experience gathered. | LearningPPOTrainer.h | |
| LearningRateCritic | float | Learning rate of the critic network. | LearningPPOTrainer.h | |
| LearningRateDecay | float | Amount by which to multiply the learning rate every 1000 iterations. | LearningPPOTrainer.h | |
| LearningRatePolicy | float | Learning rate of the policy network. Typical values are between 0.001f and 0.0001f. | LearningPPOTrainer.h | |
| PolicyBatchSize | uint32 | Batch size to use for training the policy. Large batch sizes are much more computationally efficient when training on the GPU. | LearningPPOTrainer.h | |
| PolicyWindow | uint32 | The number of consecutive steps of observations and actions over which to train the policy. | LearningPPOTrainer.h | |
| ReturnRegularizationWeight | float | Weight used to regularize predicted returns. Encourages the critic not to over or under estimate returns. | LearningPPOTrainer.h | |
| Seed | uint32 | Random Seed to use for training. | LearningPPOTrainer.h | |
| TrimEpisodeEndStepNum | int32 | Number of steps to trim from the end of each episode during training. | LearningPPOTrainer.h | |
| TrimEpisodeStartStepNum | int32 | Number of steps to trim from the start of each episode during training. | LearningPPOTrainer.h | |
| WeightDecay | float | Amount of weight decay to apply to the network. | LearningPPOTrainer.h |