Navigation
API > API/Plugins > API/Plugins/LearningTraining
References
| Module | LearningTraining |
| Header | /Engine/Plugins/Experimental/LearningAgents/Source/LearningTraining/Public/LearningPPOTrainer.h |
| Include | #include "LearningPPOTrainer.h" |
Syntax
struct FPPOTrainerTrainingSettings
Remarks
Settings used for training with PPO
Variables
| Type | Name | Description | |
|---|---|---|---|
| float | ActionRegularizationWeight | Weight used to regularize actions. | |
| bool | bAdvantageNormalization | When true, advantages are normalized. | |
| uint32 | BatchSize | Batch size to use for training. | |
| bool | bClipAdvantages | When true, very large or small advantages will be clipped. | |
| bool | bUseTensorboard | If to use TensorBoard for logging and tracking the training progress. | |
| ETrainerDevice | Device | Which device to use for training. | |
| float | DiscountFactor | The discount factor causes future rewards to be scaled down so that the policy will favor near-term rewards over potentially uncertain long-term rewards. | |
| float | EntropyWeight | Weighting used for the entropy bonus. | |
| float | EpsilonClip | Clipping ratio to apply to policy updates. | |
| float | GaeLambda | This is used in the Generalized Advantage Estimation as what is essentially an exponential smoothing/decay. | |
| float | InitialActionScale | Initial scale to apply to actions before noise is added to them. | |
| uint32 | IterationNum | Number of iterations to train the network for. | |
| float | LearningRateCritic | Learning rate of the critic network. | |
| float | LearningRateDecay | Ratio by which to decay the learning rate every 1000 iterations. | |
| float | LearningRatePolicy | Learning rate of the policy network. Typical values are between 0.001f and 0.0001f. | |
| uint32 | Seed | Random Seed to use for training. | |
| int32 | TrimEpisodeEndStepNum | Number of steps to trim from the end of each episode during training. | |
| int32 | TrimEpisodeStartStepNum | Number of steps to trim from the start of each episode during training. | |
| float | WeightDecay | Amount of weight decay to apply to the network. |