Intel® Resource Director Technology (Intel® RDT)¶
The Cache Allocation Technology (CAT) feature is part of the Intel® Resource Director Technology (Intel® RDT) feature set, which provides a number of monitoring and control technologies to help software understand and control the usage of shared resources within the platform, such as last-level cache (LLC) and memory bandwidth.
Shared last-level caches are common on modern processors. For example, on an Intel® Core™ process or an Intel® Xeon® processor, the cores share an L3 cache. Whereas on an Intel Atom® processor, cores 0 and 1 share an L2 cache as well as cores 2 and 3. Due to this reality, workloads on adjacent cores can potentially be a cause of cache misses. This occurs when a workload evicts a cache line in use by another workload. When a cache miss occurs, the workload must wait while the memory is fetched. This introduces undesired jitter into the workload execution time, subsequently impacting determinism. To mitigate this issue, Intel Cache Allocation Technology (CAT) provides a method to partition processor caches and assign these partitions to a Class-of-Service (COS). Associating workloads to different COS can effectively isolate parts of cache available to a workload, thus preventing cache contention altogether.
A set of technical articles and other resources on Intel® RDT:
See also
When used correctly, CAT can dramatically reduce the CPU jitter experienced by real-time applications. See the following section which compares actual collected data of benchmarks executed with and without CAT: 48-Hour Benchmark Results
Intel® Cache Allocation Technology Terminology¶
CAT - Cache Allocation Technology. The umbrella term for all uses of cache allocation.
PQoS - Platform Quality of Service. A Linux tool for controlling cache assignment.
COS - Class of service. A bitmask which determines which ways of a cache are exposed.
L3 - Level 3 Cache. Typically the last cache on Intel® Core™ processors and Intel® Xeon® processors.
L2 - Level 2 Cache. Typically the last cache on Intel Atom® processors.
LLC - Last-level-cache. Typically the L3 cache for Intel® Core™ processors and Intel® Xeon® processors, and L2 for Intel Atom® processors.
Attention
Intel® TCC 2022.2 does not coexist well with Intel® Resource Director Technology (Intel® RDT). When Intel® TCC Software-SSRAM is configured, the Intel® RDT Linux driver shall disable Cache Allocation Technology (CAT) at boot time through kernel command line rdt=!l2cat,!l3cat
.
The following section is applicable to:

Install Platform Quality of Service (PQoS) Tool¶
You can access the CAT features using the pqos
or pqos-msr
command. This command allows you to partition the cache and then associate a Class of Service (COS) to a specific use.
You can install the pqos
tool from the ECI APT repository. Setup the ECI APT repository, then perform the following command to install this component:
$ sudo apt install intel-cmt-cat
Install PQoS Helper¶
ECI provides a convenient script called pqos-helper
, which assists you in leveraging CAT by using the PQoS tool.
You can install the pqos-helper
script from the ECI APT repository. Setup the ECI APT repository, then perform the following command to install this component:
$ sudo apt install pqos-helper
- The script will perform the following actions in order:
Determine if the L3 (preferred) or L2 cache should be configured.
Determine the optimal cache CPS mask for the selected cache.
Save the current cache Class of Service (CPS) masks and CPU association.
Reset the cache COS masks and CPU association, if the
--pqos_rst
argument is present.Assign the cache COS masks according to the
--cos#
arguments or use the optimal cache COS mask if no--cos#
argument provided.Assign the cache CPU association according to the
--assign_cos
argument or use the default CPU association if no--assign_cos
argument provided.Execute any user provided command passed with the
--command
argument.Restore the initial cache COS masks and CPU association.
- How To Use The Script
$ /opt/pqos/pqos-helper.py <args>
These are the possible arguments that the PQoS helper script takes.
optional arguments: -h, --help show this help message and exit --cos0 COS0 used to set the class of service for the script. ex: 0xf0 --cos1 COS1 used to set the class of service for the script. ex: 0xf0 --cos2 COS2 used to set the class of service for the script. ex: 0xf0 --cos3 COS3 used to set the class of service for the script. ex: 0xf0 --assign_cos ASSIGN_COS Provide string as 'COS=CORE', separated by spaces. E.g. '0=0 1=1', etc. Use pqos style assignment, e.g. '1=0,2,6-10' --pqos_msr Use pqos-msr command (required on some systems) instead of regular pqos. --pqos_rst Calls pqos -R (reset). Resets CAT settings. --keep Keep CAT settings on exit (no restore). --command COMMAND Execute command after configuring CAT. ex: "cyclictest -l 100000"
- Example Command
$ /opt/pqos/pqos-helper.py --pqos_msr --pqos_rst --cos0 0xff0 --cos1 0x00f --assign_cos "1=3 0=0-2" --command "cyclictest -a 3 -p 99 -l 100000"
In above command –pqos_msr flag is used to force the usage
pqos-msr
instead ofpqos
. This might be required on some systems. Additionally,--pqos_rst
flag is used to reset CAT before new CAT assignment. The cache ways are set in binary representation, for example,0x00f
means ‘use the first 4 cache ways’. Using--cos0
and--cos1
, class of service 0 and 1 are defined. As only cos 0 are 1 are defined, these classes of service are assigned to the cores. As the device in this example has 0-3 cores, core 3 is assigned (--assign_cos
) to cos1, and cores from 0-2 to cos0. The--command
argument instructs the script to execute the cyclictest benchmark. The cyclictest benchmark is configured to assign thread affinity to core 3 (-a 3
), since core 3 has an isolated due to the class of service in use (cos1). Lastly, the cyclictest benchmark will execute the thread with real-time priority of 99 (-p 99
) and will exit after 100000 loops (-l 100000
).
Cache Monitoring Technology (CMT) and Memory B/W Monitoring (MBM)¶
The following examples demonstrate basic usage of the pqos
tool to modify the CPU cache allocation.
Note: Some systems require the use of pqos-msr
(Model Specific Register) as opposed to pqos
to fully access all levels of CAT enabled cache. Replace pqos
with pqos-msr
in any of the following examples, if needed.
The examples assume that the system has 12 cores. Adjust the examples as necessary for actual system core count.
Monitor all events on cores 0 to 11:
$ pqos -m all:0-11 $ pqos -m :0-11
Monitor LLC on cores 0, 2 and 6:
$ pqos -m llc:0,2,6
Monitor local memory B/W on cores 0-2 and remote memory B/W on cores 3, 4 and 5:
$ pqos -m "mbl:0-2;mbr:3,4,5"
Monitor events on groups of cores (aggregate statistics):
$ pqos -m "all:[0-11];llc:[12,13,14];mbl:[15-17,20]"
Reset Monitoring: Reclaims the in-use RMIDs.
$ pqos -r
Example CMT/MBM Usage Scenario¶
Consider a scenario where you have a host machine running three guest VMs with three cores assigned to each guest.
VM0 - cores 0-2
VM1 - cores 3-5
VM2 - cores 6-8
To monitor all events (LLC occupancy, local and remote memory B/W) run:
$ pqos -m "all:[0-2],[3-5],[6-8];"
Console output:
CORE IPC MISSES LLC[KB] MBL[MB/s] MBR[MB/s] 0-2 0.28 7893k 383.2 901.2 430.8 3-5 0.28 45k 25.3 361282.6 22.4 6-8 0.26 89468k 6778.8 43904.3 4.3
Cache Allocation Technology (CAT) Usage¶
Set COS 1 to the first 4 cache ways and COS 2 to the next 8 cache ways:
$ pqos -e "llc:1=0x000f;llc:2=0x0ff0;"
Set COS 1 on all sockets, COS 2 on socket 0 and 1 and COS 3 on sockets 2 to 3:
$ pqos -e "llc:1=0x000f;llc@0,1:2=0x0ff0;llc@2-3:3=0x3c"
Console output for pqos -s
to show current configuration:
L3CA COS definitions for Socket 0: L3CA COS0 => MASK 0xfffff L3CA COS1 => MASK 0xf L3CA COS2 => MASK 0xff0 L3CA COS3 => MASK 0xfffff ... L3CA COS definitions for Socket 1: L3CA COS0 => MASK 0xfffff L3CA COS1 => MASK 0xf L3CA COS2 => MASK 0xff0 L3CA COS3 => MASK 0xfffff ... L3CA COS definitions for Socket 2: L3CA COS0 => MASK 0xfffff L3CA COS1 => MASK 0xf L3CA COS2 => MASK 0xfffff L3CA COS3 => MASK 0x3c ... L3CA COS definitions for Socket 3: L3CA COS0 => MASK 0xfffff L3CA COS1 => MASK 0xf L3CA COS2 => MASK 0xfffff L3CA COS3 => MASK 0x3c ...
Associate cores 0, 2, and 6 to 10 with COS 1 and core 1 to COS 2:
$ pqos -a "llc:1=0,2,6-10;llc:2=1;"
Enable, disable L2 CDP:
$ pqos -R l2cdp-on $ pqos -R l2cdp-off
Enable, disable L3 CDP:
$ pqos -R l3cdp-on $ pqos -R l3cdp-off
Use current L3 CDP settings and set COS 1 code and data bitmasks:
$ pqos -e "llc:1d=0xfff;llc:1c=0xfff00;"
Show current CAT settings:
$ pqos -s
Reset CAT: Sets all COS to default (fill into all ways) and associates all cores with COS 0.
$ pqos -R
Example CAT Usage Scenario¶
Consider a scenario where you have a host machine running 3 guest VMs. Each guest is assigned 3 cores and a priority.
VM0 - cores 0-2 (P5)
VM1 - cores 3-5 (P2)
VM2 - cores 6-8 (P1)
As VM0 has the highest priority it will be assigned 8 exclusive LLC ways. VM1 and VM2 are relatively low priority so VM1 is assigned 6 ways and VM2 is assigned 4 ways, 2 of which will be shared.
First, set the 3 COS bitmasks for each VM:
$ pqos -e "llc:1=0x00ff;llc:2=0x3f00;llc:3=0xf000;"
Next, associate each COS with the cores where each VM is running:
$ pqos -a "llc:1=0-2;llc:2=3-5;llc:3=6-8;"
VM 0 now has exclusive access to 8 LLC ways, VM1 has exclusive access to 4 ways and shared access to 2 ways and VM2 has exclusive access to 2 ways and shared access to another 2 ways. All other cores have access to all other ways.
Example CAT Usage with ECI¶
After installing the ECI Deb package customizations-grub
(see Installing ECI Deb packages), the kernel boot parameters will pin all Linux kernel tasks to core 0, and isolate cores 1 and 3. Under these conditions, it is advantageous to allocate the CPU cache such that the Linux kernel tasks never share cache with any tasks running on the isolated cores. To achieve this result, perform the following steps:
Recommended CAT configuration for Intel® Core™ processors or Intel® Xeon® processors
The following example sets core 0 L3 cache mask to 0x0f
, and cores 1 & 3 L3 cache mask to 0xf0
.
Attention
This example is best suited for Intel® Core™ processors or Intel® Xeon® processors, which share last-level L3 cache. For Intel Atom® processors, see the following example.
Reset cache allocation to default state.
pqos -R
Define the allocation classes for the last-level-class (LLC). Class 0 is allocated exclusive access to the first half of the LLC. Class 1 is allocated exclusive access to the second half of the LLC.
pqos -e 'llc:0=0x0f;llc:1=0xf0'
Associate core 0 with class 0, and cores 1 & 3 with class 1.
pqos -a 'llc:0=0;llc:1=1,3'
Recommended CAT configuration for Intel® Atom® processors
The following example sets core 0 & 2 L2 cache mask to 0x0f
, and core 1 & 3 L2 cache mask to 0xf0
.
Attention
This example is best suited for Intel Atom® processors which share last-level L2 cache. For Intel® Core™ processors or Intel® Xeon® processors, see the previous example.
$ pqos -R
$ pqos -e 'l2:0=0x0f;l2:1=0xf0'
$ pqos -a 'llc:0=0,2;llc:1=1,3'
Memory Bandwidth Allocation (MBA) Usage¶
Set COS 1 to 50% available and COS 2 to 70% available:
$ pqos -e "mba:1=50;mba:2=70;"
Set COS 1 on all sockets, COS 2 on socket 0 and 1 and COS 3 on sockets 2 to 3: Note: MBA rounds numbers given to it.
$ pqos -e "mba:1=80;mba@0,1:2=64;mba@2-3:3=85"
Console output for pqos -s
to show current configuration:
L3CA/MBA COS definitions for Socket 0: MBA COS0 => 100% available MBA COS1 => 80% available MBA COS2 => 60% available MBA COS3 => 100% available ... L3CA/MBA COS definitions for Socket 1: MBA COS0 => 100% available MBA COS1 => 80% available MBA COS2 => 60% available MBA COS3 => 100% available ... L3CA/MBA COS definitions for Socket 2: MBA COS0 => 100% available MBA COS1 => 80% available MBA COS2 => 100% available MBA COS3 => 90% available ... L3CA/MBA COS definitions for Socket 3: MBA COS0 => 100% available MBA COS1 => 80% available MBA COS2 => 100% available MBA COS3 => 90% available ...
Show current MBA settings:
$ pqos -s
Reset MBA: Sets all COS to default and associates all cores with COS 0.
$ pqos -R