Navigation
API > API/Plugins > API/Plugins/LearningAgentsTraining
The configurable settings for the PPO training process.
| Name | FLearningAgentsPPOTrainingSettings |
| Type | struct |
| Header File | /Engine/Plugins/Experimental/LearningAgents/Source/LearningAgentsTraining/Public/LearningAgentsPPOTrainer.h |
| Include Path | #include "LearningAgentsPPOTrainer.h" |
Syntax
USTRUCT (BlueprintType , Category="LearningAgents")
struct FLearningAgentsPPOTrainingSettings
Variables
Public
| Name | Type | Remarks | Include Path | Unreal Specifiers |
|---|---|---|---|---|
| ActionEntropyWeight | float | Weighting used for the entropy bonus. | LearningAgentsPPOTrainer.h |
|
| ActionRegularizationWeight | float | Weight used to regularize actions. | LearningAgentsPPOTrainer.h |
|
| ActionSurrogateWeight | float | Weight for the loss used to train the policy via the PPO surrogate objective. | LearningAgentsPPOTrainer.h |
|
| bAdvantageNormalization | bool | When true, advantages are normalized. | LearningAgentsPPOTrainer.h |
|
| bSaveSnapshots | bool | If true, snapshots of the trained networks will be emitted to the intermediate directory. | LearningAgentsPPOTrainer.h |
|
| bUseGradNormMaxClipping | bool | When true, gradient norm max clipping will be used on the policy, critic, encoder, and decoder. | LearningAgentsPPOTrainer.h |
|
| bUseMLflow | bool | If true, MLflow will be used for experiment tracking. | LearningAgentsPPOTrainer.h |
|
| bUseTensorboard | bool | If true, TensorBoard logs will be emitted to the intermediate directory. | LearningAgentsPPOTrainer.h |
|
| CriticBatchSize | int32 | Batch size to use for training the critic. | LearningAgentsPPOTrainer.h |
|
| CriticWarmupIterations | int32 | Number of iterations of training to perform to warm - up the Critic. | LearningAgentsPPOTrainer.h |
|
| Device | ELearningAgentsTrainingDevice | The device to train on. | LearningAgentsPPOTrainer.h |
|
| DiscountFactor | float | The discount factor to use during training. | LearningAgentsPPOTrainer.h |
|
| EpsilonClip | float | Clipping ratio to apply to policy updates. | LearningAgentsPPOTrainer.h |
|
| GaeLambda | float | This is used in the Generalized Advantage Estimation, where larger values will tend to assign more credit to recent actions. | LearningAgentsPPOTrainer.h |
|
| GradNormMax | float | The maximum gradient norm to clip updates to. | LearningAgentsPPOTrainer.h |
|
| IterationsPerGather | int32 | Number of training iterations to perform per buffer of experience gathered. | LearningAgentsPPOTrainer.h |
|
| IterationsPerSnapshot | int32 | The iterations interval to save new networks snapshot. | LearningAgentsPPOTrainer.h |
|
| LearningRateCritic | float | Learning rate of the critic network. | LearningAgentsPPOTrainer.h |
|
| LearningRateDecay | float | Amount by which to multiply the learning rate every 1000 iterations. | LearningAgentsPPOTrainer.h |
|
| LearningRatePolicy | float | Learning rate of the policy network. Typical values are between 0.001 and 0.0001. | LearningAgentsPPOTrainer.h |
|
| MaximumAdvantage | float | The maximum advantage to allow. | LearningAgentsPPOTrainer.h |
|
| MinimumAdvantage | float | The minimum advantage to allow. | LearningAgentsPPOTrainer.h |
|
| MLflowTrackingUri | FString | The URI of the MLflow Tracking Server to log to. | LearningAgentsPPOTrainer.h |
|
| NumberOfIterations | int32 | The number of iterations to run before ending training. | LearningAgentsPPOTrainer.h |
|
| NumberOfStepsToTrimAtEndOfEpisode | int32 | The number of steps to trim from the end of the episode. | LearningAgentsPPOTrainer.h |
|
| NumberOfStepsToTrimAtStartOfEpisode | int32 | The number of steps to trim from the start of the episode, e.g. can be useful if some things are still getting setup at the start of the episode and you don't want them used for training. | LearningAgentsPPOTrainer.h |
|
| PolicyBatchSize | int32 | Batch size to use for training the policy. | LearningAgentsPPOTrainer.h |
|
| PolicyWindowSize | int32 | The number of consecutive steps of observations and actions over which to train the policy. | LearningAgentsPPOTrainer.h |
|
| RandomSeed | int32 | The seed used for any random sampling the trainer will perform, e.g. for weight initialization. | LearningAgentsPPOTrainer.h |
|
| ReturnRegularizationWeight | float | Weight used to regularize returns. Encourages the critic not to over or under estimate returns. | LearningAgentsPPOTrainer.h |
|
| WeightDecay | float | Amount of weight decay to apply to the network. | LearningAgentsPPOTrainer.h |
|
Functions
Public
| Name | Remarks | Include Path | Unreal Specifiers |
|---|---|---|---|
TSharedRef< FJsonObject > AsJsonConfig() |
LearningAgentsPPOTrainer.h |