unreal.LearningAgentsPPOTrainer¶
- class unreal.LearningAgentsPPOTrainer(outer: Object | None = None, name: Name | str = 'None')¶
Bases:
LearningAgentsManagerListenerLearning Agents PPOTrainer
C++ Source:
Plugin: LearningAgents
Module: LearningAgentsTraining
File: LearningAgentsPPOTrainer.h
Editor Properties: (see get_editor_property/set_editor_property)
critic(LearningAgentsCritic): [Read-Only] The current critic.has_training_failed(bool): [Read-Only] True if trainer encountered an unrecoverable error during training (e.g. the trainer process timed out). Otherwise, false. This exists mainly to keep the editor from locking up if something goes wrong during training.interactor(LearningAgentsInteractor): [Read-Only] The agent interactor associated with this component.is_setup(bool): [Read-Only] True if this object has been setup. Otherwise, false.is_training(bool): [Read-Only] True if training is currently in-progress. Otherwise, false.manager(LearningAgentsManager): [Read-Only] The manager this object is associated with.policy(LearningAgentsPolicy): [Read-Only] The current policy for experience gathering.training_environment(LearningAgentsTrainingEnvironment): [Read-Only] The training environment associated with this component.visual_logger_objects(Map[Name, LearningAgentsVisualLoggerObject]): [Read-Only] The visual logger objects associated with this listener.
- begin_training(trainer_training_settings=[], training_game_settings=[], reset_agents_on_begin=True) None¶
Begins the training process with the provided settings.
- Parameters:
trainer_training_settings (LearningAgentsPPOTrainingSettings) – The settings for this training run.
training_game_settings (LearningAgentsTrainingGameSettings) – The settings that will affect the game’s simulation.
reset_agents_on_begin (bool) – If true, reset all agents at the beginning of training.
- get_episode_step_num(agent_id=-1) int32¶
Gets the number of step recorded in an episode for the given agent.
- Parameters:
agent_id (int32) – The AgentId to look-up the number of recorded episode steps for
- Returns:
The number of recorded episode steps
- Return type:
int32
- has_training_failed() bool¶
Returns true if the trainer has failed to communicate with the external training process. This can be used in combination with RunTraining to avoid filling the logs with errors.
- Returns:
True if the training has failed. Otherwise, false.
- Return type:
- is_training() bool¶
Returns true if the trainer is currently training; Otherwise, false.
- Return type:
- classmethod make_ppo_trainer(manager, interactor, training_environment, policy, critic, communicator, class_=None, name="PPOTrainer", trainer_settings=[512, 1000, 10000]) -> (LearningAgentsPPOTrainer, manager=LearningAgentsManager, interactor=LearningAgentsInteractor, training_environment=LearningAgentsTrainingEnvironment, policy=LearningAgentsPolicy, critic=LearningAgentsCritic)¶
Constructs the trainer.
- Parameters:
manager (LearningAgentsManager) – The agent manager we are using.
interactor (LearningAgentsInteractor) – The agent interactor we are training with.
training_environment (LearningAgentsTrainingEnvironment) – The training environment.
policy (LearningAgentsPolicy) – The policy to be trained.
critic (LearningAgentsCritic) – The critic to be trained.
communicator (LearningAgentsCommunicator) – The communicator.
name (Name) – The trainer name.
trainer_settings (LearningAgentsPPOTrainerSettings) – The trainer settings to use.
- Returns:
manager (LearningAgentsManager): The agent manager we are using.
interactor (LearningAgentsInteractor): The agent interactor we are training with.
training_environment (LearningAgentsTrainingEnvironment): The training environment.
policy (LearningAgentsPolicy): The policy to be trained.
critic (LearningAgentsCritic): The critic to be trained.
- Return type:
tuple
- process_experience(reset_agents_on_update=True) None¶
Call this function at the end of each step of your training loop. This takes the current observations/actions/ rewards and moves them into the episode experience buffer. All agents with full episode buffers or those which have been signaled complete will be reset. If enough experience is gathered, it will be sent to the training process and an iteration of training will be run and the updated policy will be synced back.
- Parameters:
reset_agents_on_update (bool) – If true, reset all agents whenever an updated policy is received.
- run_training(trainer_training_settings=[], training_game_settings=[], reset_agents_on_begin=True, reset_agents_on_update=True) None¶
Convenience function that runs a basic training loop. If training has not been started, it will start it, and then call RunInference. On each following call to this function, it will call GatherRewards, GatherCompletions, and ProcessExperience, followed by RunInference.
- Parameters:
trainer_training_settings (LearningAgentsPPOTrainingSettings) – The settings for this training run.
training_game_settings (LearningAgentsTrainingGameSettings) – The settings that will affect the game’s simulation.
reset_agents_on_begin (bool) – If true, reset all agents at the beginning of training.
reset_agents_on_update (bool) – If true, reset all agents whenever an updated policy is received.
- setup_ppo_trainer(manager, interactor, training_environment, policy, critic, communicator, trainer_settings=[512, 1000, 10000]) -> (manager=LearningAgentsManager, interactor=LearningAgentsInteractor, training_environment=LearningAgentsTrainingEnvironment, policy=LearningAgentsPolicy, critic=LearningAgentsCritic)¶
Initializes the trainer.
- Parameters:
manager (LearningAgentsManager) – The agent manager we are using.
interactor (LearningAgentsInteractor) – The agent interactor we are training with.
training_environment (LearningAgentsTrainingEnvironment) – The training environment.
policy (LearningAgentsPolicy) – The policy to be trained.
critic (LearningAgentsCritic) – The critic to be trained.
communicator (LearningAgentsCommunicator)
trainer_settings (LearningAgentsPPOTrainerSettings) – The trainer settings to use.
- Returns:
manager (LearningAgentsManager): The agent manager we are using.
interactor (LearningAgentsInteractor): The agent interactor we are training with.
training_environment (LearningAgentsTrainingEnvironment): The training environment.
policy (LearningAgentsPolicy): The policy to be trained.
critic (LearningAgentsCritic): The critic to be trained.
- Return type:
tuple