Whereas manual PSO caching requires playing a build of your game to collect PSO (Pipeline State Object) information, PSO Precaching performs automatic PSO collection and async compilation for all PSOs that could be used during MeshDrawCommand rendering of a component.
Overview
Primitive Components (UPrimitiveComponent
) precache all the PSOs needed for rendering immediately after loading (during PostLoad). The precache collects all the pipeline state information needed to compile the PSOs, including:
-
Materials.
-
Vertex factories.
-
Vertex element information.
-
Specific precache parameters.
UE uses this information to iterate over all possible mesh pass processors where the component can be rendered. Each mesh pass processor adds the possible PSO initializers which it could need during rendering. Background tasks check a shared PSO cache to make sure the needed data isn't already being precached and compile these requests asynchronously.
When UE creates a Primitive Proxy for a Primitive Component and its required PSOs are still compiling, several options are available:
-
Delay the proxy creation until the PSO compilation is finished (default). This will effectively skip the draw until the PSO is ready.
-
Replace the material with the engine's default material.
-
Continue and have a possible hitch. The draw will block on the PSO compilation.
Configure PSO Precaching
The following CVars control PSO precaching:
CVar | Description | Default State |
---|---|---|
r.PSOPrecaching |
Global CVar to enable PSO precaching. Relies on RHI flag GRHISupportsPSOPrecaching . |
Enabled |
r.PSOPrecache.Components |
Precache PSOs used by components. | Enabled |
r.PSOPrecache.Resources |
Precache PSOs used by all resources (UStaticMesh , USkinnedMesh , etc.). These PSOs might not have the correct render states because certain states can only be derived from components. However, they should give the driver the correct shaders to compile. |
Disabled |
r.PSOPrecache.ProxyCreationWhenPSOReady |
Wait for component proxy creation until all required PSOs are compiled. If they are still compiling when creating the proxy, those PSOs are marked as high-priority. | Enabled |
r.PSOPrecache.ProxyCreationDelayStrategy |
When PSOs are still compiling during proxy creation of the component, this adds an option to replace the material with the default material. This relies on r.PSOPrecache.ProxyCreationWhenPSOReady . See Proxy Creation Delay Strategy below for more information. |
0 (see below) |
r.PSOPrecaching.WaitForHighPriorityRequestsOnly |
Only wait for high-priority PSOs during loading. All non-essential PSOs will still compile during gameplay. A PSO is marked as high-priority when it is needed by a proxy and is not done compiling yet. | Disabled |
Proxy Creation Delay Strategy
The r.PSOPrecache.ProxyCreationDelayStrategy
CVar relies on the r.PSOPrecache.ProxyCreationWhenPSOReady
CVar. If ProxyCreationWhenPSOReady
is set to 1 (enabled), ProxyCreationDelayStrategy
will run the following behaviors depending on its value:
Value | Behavior |
---|---|
0 | Skip the draw until the PSO is ready. |
1 | Fallback to the engine's default material until the PSO is ready. |
Manage System Resources
PSO precaching relies on asynchronous compilation using background threads and has an impact on system memory. This section explains the options available for adjusting and optimizing the use of these resources as fits your project.
Memory
To save runtime system memory, UE deletes PSOs compiled for precaching after compilation. If the amount of PSOs precached by your application is very high, it can dramatically increase the memory footprint of the application unless they are cleaned up (hundreds of MBs).
PSO precaching relies on the existence of an underlying compressed driver cache. If a PSO is needed at runtime, the graphics driver will load it from its compressed driver cache. However, this can also be resource-intensive, and the first retrievals from these caches can take a few milliseconds. You can disable deletion of the precached PSOs in D3D12 with D3D12.PSOPrecache.KeepLowLevel
.
Processing
By default, UE's background threads are used to compile the PSO asynchronously. However, you can use a separate PSO precaching thread pool to have less contention with the foreground threads. You can set these with either of the following CVars:
CVar | Description |
---|---|
r.pso.PrecompileThreadPoolSize |
Sets the exact amount of threads to use in the pool. |
r.pso.PrecompileThreadPoolPercentOfHardwareThreads |
Sets the thread pool size to be a percentage of available hardware threads, and creates a thread pool with that size. |
You can use the command line argument -clearPSODriverCache
to force-clear the driver cache, which we recommend for testing the first-time startup experience of your game.
When testing on PCs with a large number of cores, we also recommend limiting the core count to 8, or another typical core count for a consumer-grade PC, using the command line argument -corelimit=n
, where n
is the number of cores. This ensures that you will more accurately replicate the final user experience.
Use the -clearPSODriverCache
switch consistently for all test runs that assess the smoothness of your game. Without it, hitches may be masked by the PSO cache built by the graphics driver and left over from the previous runs.
Validation and Tracking
There are several options to validate and track the performance of the PSO precaching system.
You can enable Validation with r.PSOPrecache.Validation
using the following values:
Value | Description |
---|---|
0 | Disabled. |
1 | Lightweight tracking with high-level numbers only (minimal performance impact). |
2 | Detailed tracking. |
When PSO precache validation is active, you can inspect the stats collected using the stat PSOPrecache
console command.

The statistics collected by the PSO precaching validation system. Use the stat PSOPrecache console command to view them.
Stats are split into 3 groups:
Group | Description |
---|---|
Shader-only PSOs | These stats only track the used RHI shaders and ignore all the other state information in the PSOs. This is useful to see if at least all the shaders are precached and whether something is missing/wrong with the other render states. Requires r.PSOPrecache.Validation.TrackMinimalPSOs . |
Minimal PSOs | Contains the shaders and all the render stats and vertex element information except render target information. Render target information is only available for validation at draw time, but the minimal PSO stats can be updated and checked during MeshDrawCommand building. Requires r.PSOPrecache.Validation.TrackMinimalPSOs . |
Full PSOs | The complete runtime-required PSO state used by the graphics API. This is the same as Minimal PSOs but with extra render target information. |
For each group, the following parameters are tracked:
Parameter | Description |
---|---|
Missed | The number of PSOs that were not precached, but should have been because they were needed at draw or dispatch time. Possible reasons: wrong shader, render target state, vertex attributes, render target information. |
Untracked | The number of PSOs for which precaching is not enabled. Possible reasons: validation disabled, global material, unsupported vertex factory, unsupported mesh pass processor type. In shipping builds, where certain debug information is not available, untracked PSOs will show up as missed instead. |
Hit | The number of PSOs used at runtime that were successfully precached. |
Too late | The number of PSOs that were queued for precaching, but were not compiled in time for when they were needed. |
Used | The number of PSOs used at runtime (sum of all the above). |
Precached | The number of PSOs that were precached (but not necessarily used). |
The Shader Pipeline Cache also gives information about how many actual runtime hitches were detected due to PSO compilation itself. A PSO compilation is marked as a hitch if the compilation took longer than a certain amount of milliseconds for the runtime PSO to be compiled. The default threshold is 20 milliseconds. You can modify this with r.PSO.RuntimeCreationHitchThreshold
, but you should keep it as small as possible.
The default value of 20 milliseconds is high because the first hits on the driver cache can take a long time.
Collect information on PSO precaching
You can use the Visual Studio debugger and Unreal Insights to get more information about PSO precaching and investigate why certain PSOs may still cause hitches at runtime. Correct PSO precache states will only show up in Insights when PSO validation is enabled (see Validation and Tracking above).
The screenshot below shows a PSO precache miss causing a runtime hitch:
Click to enlarge image.
The following screenshot instead shows a hitch which is coming from untracked PSOs. These are likely Global Shaders used for the first time right after level loading:
Click to enlarge image.
Unreal Insights gives you high level information about hitches coming from PSO compilation. To Debug where the above PSO precache miss comes from, you need to use manual debugging in Visual Studio.
You can find more detailed information in global PSO validation helper objects. When validation is set to full tracking (r.PSOPrecache.Validation=2
), it groups numbers by mesh pass processor and vertex factory type, which can help track down where certain misses are coming from. It can also help provide a clearer idea of where all the precached PSOs are coming from, and it can help find outliers which shouldn't precache that many shaders.
While these per-pass and per-vertex factory statistics are not exposed directly, they can be inspected during debugging by navigating the data structures that collect them. They are in PSOPrecache.cpp
:
-
FullPSOPrecacheStatsCollector
-
ShadersOnlyPSOPrecacheStatsCollector
-
MinimalPSOPrecacheStatsCollector
.
The screenshot below shows an example.
Click to enlarge image.
Extend PSO precaching with new engine features
This section provides information about how to extend the supporting objects for PSO precaching.
New UPrimitiveComponent
UPrimitiveComponent
collects all the information needed to set up the PSO initializer. It needs the material instance, vertex factory (with possible vertex element set), and the set of parameters which could influence the final shader or render state used in the FMeshPassProcessor
.
The parameters are stored in FPSOPrecacheParams
and the correct default values are set up in UPrimitiveComponent::SetupPrecachePSOParams
.
The base entry function for PSO precaching is:
/** Precache all PSOs which can be used by the primitive component */
ENGINE_API virtual void PrecachePSOs();
In most cases, the derived component doesn't need to implement this function, and it can simply override the precache parameter collection function:
/**
* Collect all the data required for PSO precaching
*/
struct FComponentPSOPrecacheParams
{
EPSOPrecachePriority Priority = EPSOPrecachePriority::Medium;
UMaterialInterface* MaterialInterface = nullptr;
FPSOPrecacheVertexFactoryDataList VertexFactoryDataList;
FPSOPrecacheParams PSOPrecacheParams;
};
typedef TArray<FComponentPSOPrecacheParams, TInlineAllocator<2> > FComponentPSOPrecacheParamsList;
virtual void CollectPSOPrecacheData(const FPSOPrecacheParams& BasePrecachePSOParams, FComponentPSOPrecacheParamsList& OutParams) {}
A fully-featured example can be found in UStaticMeshComponent::CollectPSOPrecacheData
, and a simpler use case can be found in WaterMeshComponent::CollectPSOPrecacheData
.
New FVertexFactory
A new Vertex Factory needs to flag that it supports PSO precaching using the flag EVertexFactoryFlags::SupportsPSOPrecaching, which can be provided with the vertex factory declaration macro
IMPLEMENT_VERTEX_FACTORY_TYPE`.
Then, the vertex factory must implement the following function:
static void GetPSOPrecacheVertexFetchElements(EVertexInputStreamType VertexInputStreamType, FVertexDeclarationElementList& Elements);
FVertexFactory::GetPSOPrecacheVertexFetchElements
is used during PSO precaching if no explicit vertex element set is provided.
The fixed vertex element set will be valid if the EVertexFactoryFlags::SupportsManualVertexFetch
flag is set on the vertex factory or if a fixed vertex element set is used in the shader.
If the vertex element list is dependent on the vertex buffer data of the mesh, then the correct set will need to be provided in FPSOPrecacheVertexFactoryData
. This should happen during UPrimitiveComponent::CollectPSOPrecacheData
. See UStaticMeshComponent::CollectPSOPrecacheData
and FLocalVertexFactory::GetVertexElements
for examples.
New FMeshPassProcessor
The mesh pass processor has to implement the following function to collect all the PSOs which can be used when drawing a certain material with the given FPSOPrecacheParams
:
virtual void CollectPSOInitializers(const FSceneTexturesConfig& SceneTexturesConfig, const FMaterial& Material, const FPSOPrecacheVertexFactoryData& VertexFactoryData, const FPSOPrecacheParams& PreCacheParams, TArray<FPSOPrecacheData>& PSOInitializers) override {}
The logic is mostly the same as AddMeshBatch
(and can ideally be partially shared), but while AddMeshBatch
is called at MeshDrawCommand building time, the PSO precaching system tries to collect the information a lot sooner (PostLoad of the component).
For a simple example, see FDistortionMeshProcessor::CollectPSOInitializers
For a more comprehensive example, see FBasePassMeshProcessor::CollectPSOInitializers
.
Debug a PSO Precache Miss
Debugging misses on minimal PSO state is straightforward, because these can be triggered during MeshDrawCommand building and not at draw time. The final render target information (needed to compute the full PSO) is only available during drawing, which makes it harder to debug.
The function PSOCollectorStats::FPrecacheStatsCollector::UpdatePrecacheStats
updates the tracked internal state and is a convenient place to break with the debugger when a miss occurs at runtime. The call stack and watch window can give more information about the used material, render pass, vertex factory, and the FPrimitiveSceneProxy
. You can also get information about the UPrimitiveComponent
using the ComponentForDebuggingOnly
member.
However, by the time UpdatePrecacheStats
runs, PSO precaching has usually already happened on that component. If you are trying to find out why an incorrect shader or render state was used during precaching for the PSOs of that component, you need to add a breakpoint during PSO precaching for that component and / or material for the given pass.
The easiest way to set up a breakpoint during precaching is to find the FMaterial
asset name during MeshDrawCommand building and use that name to add a breakpoint during PSO collection of the same MeshPassProcessor. Then, compare the PSO state used during MeshDrawCommand building and the state setup during PSO precaching.
You might also need to check the values of FPSOPrecacheParams
, because these could also influence the shader and render state used in the PSO.
UE_DISABLE_OPTIMIZATION
void FBasePassMeshProcessor::CollectPSOInitializers(const FSceneTexturesConfig& SceneTexturesConfig, const FMaterial& Material, const FPSOPrecacheVertexFactoryData& VertexFactoryData, const FPSOPrecacheParams& PreCacheParams, TArray<FPSOPrecacheData>& PSOInitializers)
{
FString MaterialName = Material.GetAssetName();
if (MaterialName == TEXT("TEST_MATERIAL_NAME"))
{
UE_DEBUG_BREAK();
}
// … rest of function…
}