Audio Driven Animation | MetaHuman Documentation

This page is not available in the language you have chosen. It will be displayed in English by default. If you would like to view it in a different language, you can try selecting another language.

The Audio Driven Animation feature in MetaHuman Animator gives you the ability to process audio into realistic facial animation. This can be done in real time or offline. This page describes how to process audio offline, where additional solve parameters allow you to influence the head movement, blinks, and mood.

You can learn more about capturing your facial performance by reading the MetaHuman Facial Performance Capture Guidelines, including the Audio Capture Recommendations. MetaHuman Animator can use any audio format supported by Unreal Engine as a SoundWave asset. Read the Importing Audio Files documentation to learn more.

To configure your project for offline animation from mono video, make sure you have enabled the MetaHuman Animator plugin.

Although the MetaHuman Performance asset can be configured to generate animation from audio using the Realtime Audio Solver, this is still an offline process and is not the same as generating animation in real time from a MetaHuman Audio Live Link Source. Refer to the Realtime Animation section for information on generating animation in real time using Live Link.

This feature is also available in UEFN, with the exception of exporting a level sequence from a performance, which is currently not supported.

Prerequisites

The offline workflow to generate animation from audio requires the following:

An Unreal Engine 5.6 (or later) project.
The MetaHuman Animator plugin enabled.
One or more SoundWave assets, either from an existing audio file or created using Take Recorder and a connected audio device.

Processing in the Performance Asset Editor

Create a new MetaHuman Performance asset by right-clicking in the Content Browser, then select MetaHuman Animator > MetaHuman Performance.
Double-click the MetaHuman Performance file to configure the Details panel.
1. Select Audio as the Input Type (1).
2. In the Audio sample, select the SoundWave asset that will be used to generate the animation (2).
3. In the Visualization Mesh, select your MetaHuman face mesh (3).
Click to expand.
Click Process to generate the facial animations.
This will process the audio and create the Facial Rig animation tracks in Sequencer.

Once the process is complete, you can click Export Animation or Export Level Sequence to export the facial animation. If you choose Level Sequence, remember to assign your MetaHuman character blueprint.

Option	Description
Downmix Channels	If enabled, this will downmix multi channel audio (most commonly stereo) down to a mono channel when processing. If disabled, the channel used for processing is specified by the “Audio Channel Index”.
Generate Blinks	This option will enable or disable the generation of blinks with the audio solve.

Option

Description

Downmix Channels

If enabled, this will downmix multi channel audio (most commonly stereo) down to a mono channel when processing.

If disabled, the channel used for processing is specified by the “Audio Channel Index”.

Generate Blinks

This option will enable or disable the generation of blinks with the audio solve.

Additional Solve Parameters

As of UE 5.6, we now have even more options for tweaking your audio solve. Here is a breakdown of what you can change as part of your audio solve:

	Parameter	Description
1	Head Movement	Much like in Depth mode or Real-Time (Video), Audio Driven Animation can now generate head movement straight onto your MetaHuman.
2	Frames to Process	Choose the Start and End frames of your audio to process into animation.
3	Realtime Audio	Use the realtime audio solver. This uses the real-time algorithm to generate animation which does not produce head motion. As a result, enabling realtime audio will disable or hide other options described in this section.
4	Generate Blinks	Choose whether or not to generate blinks as part of your audio-to-animation solve.
5	Process Mask	Choose to generate a full face animation or just curves relevant to the mouth region for your MetaHuman, which you can use to swap/layer alternative lip animations for lip sync.
6	Solve Overrides	You can choose to override your mood. Choose from Auto Detect or a list of alternative moods, and specify the intensity for the mood. Neutral has no intensity. Auto Detect Neutral Happy Confident Excited Playful Sad Bored Fear Confused Disgust Anger Surprise

Once the animation has been generated it can be exported as an Animation Sequence or Level Sequence asset.

Prerequisites

Processing in the Performance Asset Editor

Additional Solve Parameters

Next Up