Linux OS Runtime Optimizations¶
To achieve real-time performance on a target system, certain runtime configurations and optimizations are recommended. The section establishes a basis for enabling real-time capable workloads.
CPU Isolation¶
When using the ECI Linux* Intel® LTS PREEMPT_RT kernel, all Linux kernel processes are scheduled to run on CPU 0, and CPUs 1 & 3 (13th generation processors and older) or 2 & 4 (14th generation processors and newer) are configured to be isolated for real-time usage (see ECI Kernel Boot Optimizations).
This creates a side effect where the workloads utilizing CPU 0 will experience degraded performance. Therefore, it is recommended to move all critical processes to a CPU other than CPU 0.
For reference, the following code snippet shows the default kernel boot parameters that affect CPUs:
Network Interrupts Affinity to CPU¶
When using the ECI Linux Intel LTS PREEMPT_RT kernel, all Ethernet device MSI interrupts Linux network device are scheduled to run on CPU 0.
This creates a side effect where the workloads may utilize CPU 0 for all Ethernet devices interrupt handling (for example, top-half and bottom-half handler) and degraded performance.
Therefore, it is recommended to selectively move critical Ethernet device interrupts for prioritized traffic-class onto a CPU other than CPU 0.
The following section is applicable to:

Install Network-irq-affinity Tool¶
You can access the table listing the default mapping of CPU to device interrupt via sysfs
/proc/interrupts
.
root@eci-intel-0474:~# cat /proc/interrupts | grep -e CPU. -e enp.s.
CPU0 CPU1 CPU2 CPU3
127: 1 0 0 0 PCI-MSI 524288-edge enp1s0
128: 362502 0 0 0 PCI-MSI 524289-edge enp1s0-TxRx-0
129: 197962 0 0 0 PCI-MSI 524290-edge enp1s0-TxRx-1
130: 176611 0 0 0 PCI-MSI 524291-edge enp1s0-TxRx-2
131: 183731 0 0 0 PCI-MSI 524292-edge enp1s0-TxRx-3
132: 7 0 0 0 PCI-MSI 2097152-edge enp4s0
133: 487832 0 0 0 PCI-MSI 2097153-edge enp4s0-TxRx-0
134: 174277 0 0 0 PCI-MSI 2097154-edge enp4s0-TxRx-1
135: 164195 0 0 0 PCI-MSI 2097155-edge enp4s0-TxRx-2
136: 164147 0 0 0 PCI-MSI 2097156-edge enp4s0-TxRx-3
137: 53 0 0 0 PCI-MSI 2621440-edge enp5s0
138: 166660 0 0 0 PCI-MSI 2621441-edge enp5s0-TxRx-0
139: 174901 0 0 0 PCI-MSI 2621442-edge enp5s0-TxRx-1
140: 164256 0 0 0 PCI-MSI 2621443-edge enp5s0-TxRx-2
141: 164384 0 0 0 PCI-MSI 2621444-edge enp5s0-TxRx-3
You can install the network-irq-affinity
helper script from the ECI repository to allow remapping CPU and Ethernet device interrupts for a specific use.
Tools |
Version |
Source |
---|---|---|
Ethernet Device MSI interrupt CPU mapping helper script |
1.1+patchset |
Setup the ECI repository, then perform the following command to install this component:
The following are the possible arguments of the network-irq-affinity
helper script:
Usage: network-irq-affinity [-hnv] -i <irqname>/<cpunum> -i <irqname>/<cpunum> ...
Options:
-h Show this usage message.
-i tuple ethernet interface irq <irqname> to cpu core <cpunum>
-n Do not change anything, just show what would be done.
-v Verbose output of changes made.
**Note**: In Tuple Ethernet interrupt list, ``-i <irqname>/<cpunum>`` precedence matters for ``<irqname>`` affinity to CPU core ``<cpunum>``.
Example Command
In the above example, the -i <irqname>/<cpunum>
tuple list is used to prioritize mapping both enp4s0-TxRx-1
and enp5s0-TxRx-1
MSI interrupts to CPU1, before mapping the remaining MSI interrupts respectively to CPU2 for enp4s0
and CPU3 for enp5s0
.
network-irq-affinity: Assigning enp5s0-TxRx-1 on IRQ 139 to CPU 1
network-irq-affinity: Assigning enp4s0-TxRx-1 on IRQ 134 to CPU 1
network-irq-affinity: Assigning enp4s0-TxRx-0 on IRQ 133 to CPU 2
network-irq-affinity: Assigning enp4s0-TxRx-2 on IRQ 135 to CPU 2
network-irq-affinity: Assigning enp4s0-TxRx-3 on IRQ 136 to CPU 2
network-irq-affinity: Assigning enp5s0-TxRx-0 on IRQ 138 to CPU 3
network-irq-affinity: Assigning enp5s0-TxRx-2 on IRQ 140 to CPU 3
network-irq-affinity: Assigning enp5s0-TxRx-3 on IRQ 141 to CPU 3
Verify whether the Ethernet device interrupt remap is effective using the following code:
CPU0 CPU1 CPU2 CPU3
127: 1 0 0 0 PCI-MSI 524288-edge enp1s0
128: 362759 0 0 0 PCI-MSI 524289-edge enp1s0-TxRx-0
129: 198104 0 0 0 PCI-MSI 524290-edge enp1s0-TxRx-1
130: 176736 0 0 0 PCI-MSI 524291-edge enp1s0-TxRx-2
131: 183855 0 0 0 PCI-MSI 524292-edge enp1s0-TxRx-3
132: 7 0 0 0 PCI-MSI 2097152-edge enp4s0
133: 488175 0 27 0 PCI-MSI 2097153-edge enp4s0-TxRx-0
134: 174399 8 0 0 PCI-MSI 2097154-edge enp4s0-TxRx-1
135: 164310 0 8 0 PCI-MSI 2097155-edge enp4s0-TxRx-2
136: 164262 0 8 0 PCI-MSI 2097156-edge enp4s0-TxRx-3
137: 53 0 0 0 PCI-MSI 2621440-edge enp5s0
138: 166775 0 6 2 PCI-MSI 2621441-edge enp5s0-TxRx-0
139: 175020 8 0 0 PCI-MSI 2621442-edge enp5s0-TxRx-1
140: 164371 0 6 2 PCI-MSI 2621443-edge enp5s0-TxRx-2
141: 164499 0 6 2 PCI-MSI 2621444-edge enp5s0-TxRx-3
Best Practices for Achieving Real-time Performance¶
The following section is applicable to:

Eliminate Sources of CPU Contention¶
To achieve real-time performance, it is imperative to isolate real-time workloads from other tasks. This can be achieved by using a real-time kernel and modifying the kernel boot parameters. ECI provides a Deb package named customizations-grub
, which modifies the kernel boot parameters upon installation (see ECI Kernel Boot Optimizations). ECI targets core-bookworm and core-jammy have the customizations-grub
package installed, by default. Refer to Install ECI Packages to learn how to install ECI Deb packages.
The ECI Deb package customizations-grub
modifies the kernel boot parameters such that CPUs 1 & 3 (13th generation processors and older) or 2 & 4 (14th generation processors and newer) are isolated, and CPU 0 is reserved to handle Linux kernel interrupts (see ECI Kernel Boot Optimizations). This configuration allows the use of CPUs 1 and 3 for real-time workloads.
Important
This creates a side effect that any workloads which utilize CPU 0 will experience degraded performance. Therefore, it is recommended to move all critical processes to a CPU other than CPU 0.
For reference, the following snippet shows the default kernel boot parameters which affect CPUs:
See also
For a list of ECI kernel boot optimizations, refer to ECI Kernel Boot Optimizations.
For best performance, only assign a single isolated CPU per real-time workload. The following example executes the workload and the assigns the affinity of the workload to CPU 3 (where <workload>
is replaced with the application to run):
To assign affinity of all the child tasks of a parent workload, run the following command (where <workload>
is the name of your workload):
Prioritize Workloads¶
A simple and effective method to boost the performance of a real-time workload is to increase its runtime priority. Use the following command to run a workload with increased runtime priority (where <workload>
is the name of your application):
To assign priority of all the child tasks of a parent workload, run the following (where <workload>
is the name of your workload):
Use Cache Allocation Technology¶
Shared last-level caches are common on modern processors. For example, on an Intel® Core™ processor or Intel® Xeon® processor, the cores share an L3 cache. Whereas on an Intel Atom® processor, cores 0 and 1 share an L2 cache as well as cores 2 and 3. Due to this reality, workloads on adjacent cores can potentially be a cause of cache misses. This occurs when a workload evicts a cache line in use by another workload. When a cache miss occurs, the workload must wait while the memory is fetched. This introduces undesired jitter into the workload execution time, subsequently impacting determinism. To mitigate this issue, Intel Cache Allocation Technology provides a method to partition processor caches and assign these partitions to a Class-of-Service (COS). Associating workloads to different COS can effectively isolate parts of cache available to a workload, thus preventing cache contention altogether. See Cache Allocation Technology for more information on using CAT.
The ECI Deb package customizations-grub
modifies the kernel boot parameters such that CPUs 1 & 3 (13th generation processors and older) or 2 & 4 (14th generation processors and newer) are isolated, and CPU 0 is reserved to handle Linux kernel interrupts (see ECI Kernel Boot Optimizations). Under these conditions, it is advantageous to allocate the CPU cache such that the Linux kernel tasks never share cache with any tasks running on the isolated cores. To achieve this result, perform the following steps:
Recommended CAT configuration for Intel Core™ or Xeon® processors
The following example sets core 0 L3 cache mask to 0x0f
, and cores 1 and 3 L3 cache mask to 0xf0
.
Attention
This example is best suited for Intel® Core™ or Intel® Xeon® processors, which share last-level L3 cache. For Intel Atom® processors, see the subsequent example.
Reset cache allocation to default state.
Define the allocation classes for the last-level-class (LLC). Class 0 is allocated exclusive access to the first half of the LLC. Class 1 is allocated exclusive access to the second half of the LLC.
Associate core 0 with class 0, and cores 1 and 3 with class 1.
Recommended CAT configuration for Intel Atom® processors
The following example sets core 0 and 2 L2 cache mask to 0x0f
, and core 1 and 3 L2 cache mask to 0xf0
.
Attention
This example is best suited for Intel Atom® processors, which share last-level L2 cache. For Intel® Core™ processors or Intel® Xeon® processors, see previous example.
Stop Unnecessary Services¶
Many services run in the background, by default, on Linux. Stopping services may reduce spurious interrupts depending on the workload type. To list the loaded services, run the following command:
To stop a service, run the following command (where <service>
is the name a service):
Disable Machine Checks¶
By default, the Linux kernel periodically scans hardware for reported errors. While this feature can be useful for tracking down troublesome bugs, it also presents a source of workload preemption. Disabling this check improves real-time performance of workloads by preventing the Linux kernel from interrupting the running tasks. Run the following command to disable machine checks:
Increase Thread Runtime Limit¶
The default values for the real-time throttling mechanism define that 95% of the CPU time can be used by real-time tasks. The remaining 5% will be devoted to non-realtime tasks (tasks running under SCHED_OTHER
and similar scheduling policies). It is important to note that if a single real-time task occupies that 95% CPU time slot, the remaining real-time tasks on that CPU will not run. The remaining 5% of CPU time is used only by non-realtime tasks.
The impact of the default values is two-fold: rogue real-time tasks will not lock up the system by not allowing non-realtime tasks to run and, on the other hand, real-time tasks will have at most 95% of CPU time available from them, probably affecting their performance.
If it is known that a particular workload is stable consuming 100% of a CPU, it is possible to configure the runtime limit to infinity.
Typical Workload Optimization Flow¶
When executing a workload, complete the following steps to increase real-time performance of the workload:
Stop unnecessary services. In this example we stop wireless communication related services:
Stop Docker daemon (if containers not used):
Disable kernel machine check interrupt:
Disable thread runtime limit:
Setup cache partitioning. See Cache Allocation Technology for information on using CAT. This example sets core 0 and 2 L2 cache mask to
0x0f
, and core 1 and 3 L2 cache mask to0xf0
.Assign all non-realtime task affinity to core 0. The following script iterates through all interrupts and attempts to assign affinity to core 0.
Offload RCU tasks:
Start the real-time workload (where
<workload>
is the name of your workload):Change affinity of workload tasks (where
<workload>
is the name of your workload). This example assigns affinity of all workload tasks to core 3.Change priority of workload tasks to be real-time (where
<workload>
is the name of your workload):Minimize integrated GPU utilization on the processor. Rather than connecting a display monitor to the target system, use SSH to access the system. This minimizes the interrupts generated by the integrated GPU, thus improving system determinism.