Attention

You are viewing an older version of the documentation. The latest version is v3.3.

Linux OS Runtime Optimizations

To achieve real-time performance on a target system, certain runtime configurations and optimizations are recommended. The section establishes a basis for enabling real-time capable workloads.

CPU Isolation

When using the ECI Linux* Intel® LTS PREEMPT_RT kernel, all Linux kernel processes are scheduled to run on CPU 0, and CPUs 1 & 3 (13th generation processors and older) or 2 & 4 (14th generation processors and newer) are configured to be isolated for real-time usage (see ECI Kernel Boot Optimizations).

This creates a side effect where the workloads utilizing CPU 0 will experience degraded performance. Therefore, it is recommended to move all critical processes to a CPU other than CPU 0.

For reference, the following code snippet shows the default kernel boot parameters that affect CPUs:

nmi_watchdog=0
irqaffinity=0
isolcpus=1,3 or 2,4
rcu_nocbs=1,3 or 2,4
nohz_full=1,3 or 2,4
Copy to clipboard

Network Interrupts Affinity to CPU

When using the ECI Linux Intel LTS PREEMPT_RT kernel, all Ethernet device MSI interrupts Linux network device are scheduled to run on CPU 0.

This creates a side effect where the workloads may utilize CPU 0 for all Ethernet devices interrupt handling (for example, top-half and bottom-half handler) and degraded performance.

Therefore, it is recommended to selectively move critical Ethernet device interrupts for prioritized traffic-class onto a CPU other than CPU 0.

The following section is applicable to:

../../_images/target_generic3.png

Install Network-irq-affinity Tool

You can access the table listing the default mapping of CPU to device interrupt via sysfs /proc/interrupts.

root@eci-intel-0474:~# cat /proc/interrupts | grep -e CPU. -e enp.s.
            CPU0       CPU1       CPU2       CPU3
127:          1          0          0          0   PCI-MSI 524288-edge      enp1s0
128:     362502          0          0          0   PCI-MSI 524289-edge      enp1s0-TxRx-0
129:     197962          0          0          0   PCI-MSI 524290-edge      enp1s0-TxRx-1
130:     176611          0          0          0   PCI-MSI 524291-edge      enp1s0-TxRx-2
131:     183731          0          0          0   PCI-MSI 524292-edge      enp1s0-TxRx-3
132:          7          0          0          0   PCI-MSI 2097152-edge      enp4s0
133:     487832          0          0          0   PCI-MSI 2097153-edge      enp4s0-TxRx-0
134:     174277          0          0          0   PCI-MSI 2097154-edge      enp4s0-TxRx-1
135:     164195          0          0          0   PCI-MSI 2097155-edge      enp4s0-TxRx-2
136:     164147          0          0          0   PCI-MSI 2097156-edge      enp4s0-TxRx-3
137:         53          0          0          0   PCI-MSI 2621440-edge      enp5s0
138:     166660          0          0          0   PCI-MSI 2621441-edge      enp5s0-TxRx-0
139:     174901          0          0          0   PCI-MSI 2621442-edge      enp5s0-TxRx-1
140:     164256          0          0          0   PCI-MSI 2621443-edge      enp5s0-TxRx-2
141:     164384          0          0          0   PCI-MSI 2621444-edge      enp5s0-TxRx-3
Copy to clipboard

You can install the network-irq-affinity helper script from the ECI repository to allow remapping CPU and Ethernet device interrupts for a specific use.

Tools

Version

Source

Ethernet Device MSI interrupt CPU mapping helper script

1.1+patchset

https://github.com/suominen/network-irq-affinity

Setup the ECI repository, then perform the following command to install this component:

$ sudo apt install network-irq-affinity
Copy to clipboard

The following are the possible arguments of the network-irq-affinity helper script:

   Usage:  network-irq-affinity [-hnv] -i <irqname>/<cpunum> -i <irqname>/<cpunum> ...

   Options:
   -h      Show this usage message.
   -i      tuple ethernet interface irq <irqname> to cpu core <cpunum>
   -n      Do not change anything, just show what would be done.
   -v      Verbose output of changes made.

**Note**: In Tuple Ethernet interrupt list, ``-i <irqname>/<cpunum>`` precedence matters for ``<irqname>`` affinity to CPU core ``<cpunum>``.
Copy to clipboard

Example Command

$ network-irq-affinity -i enp5s0-TxRx-1/1 -i enp4s0-TxRx-1/1 -i enp4s0-TxRx-*/2 -i enp5s0-TxRx-*/3 -v
Copy to clipboard

In the above example, the -i <irqname>/<cpunum> tuple list is used to prioritize mapping both enp4s0-TxRx-1 and enp5s0-TxRx-1 MSI interrupts to CPU1, before mapping the remaining MSI interrupts respectively to CPU2 for enp4s0 and CPU3 for enp5s0.

network-irq-affinity: Assigning enp5s0-TxRx-1 on IRQ 139 to CPU 1
network-irq-affinity: Assigning enp4s0-TxRx-1 on IRQ 134 to CPU 1
network-irq-affinity: Assigning enp4s0-TxRx-0 on IRQ 133 to CPU 2
network-irq-affinity: Assigning enp4s0-TxRx-2 on IRQ 135 to CPU 2
network-irq-affinity: Assigning enp4s0-TxRx-3 on IRQ 136 to CPU 2
network-irq-affinity: Assigning enp5s0-TxRx-0 on IRQ 138 to CPU 3
network-irq-affinity: Assigning enp5s0-TxRx-2 on IRQ 140 to CPU 3
network-irq-affinity: Assigning enp5s0-TxRx-3 on IRQ 141 to CPU 3
Copy to clipboard

Verify whether the Ethernet device interrupt remap is effective using the following code:

root@eci-intel-0474:~# cat /proc/interrupts | grep -e CPU. -e enp.s.
Copy to clipboard
            CPU0       CPU1       CPU2       CPU3
127:          1          0          0          0   PCI-MSI 524288-edge      enp1s0
128:     362759          0          0          0   PCI-MSI 524289-edge      enp1s0-TxRx-0
129:     198104          0          0          0   PCI-MSI 524290-edge      enp1s0-TxRx-1
130:     176736          0          0          0   PCI-MSI 524291-edge      enp1s0-TxRx-2
131:     183855          0          0          0   PCI-MSI 524292-edge      enp1s0-TxRx-3
132:          7          0          0          0   PCI-MSI 2097152-edge      enp4s0
133:     488175          0         27          0   PCI-MSI 2097153-edge      enp4s0-TxRx-0
134:     174399          8          0          0   PCI-MSI 2097154-edge      enp4s0-TxRx-1
135:     164310          0          8          0   PCI-MSI 2097155-edge      enp4s0-TxRx-2
136:     164262          0          8          0   PCI-MSI 2097156-edge      enp4s0-TxRx-3
137:         53          0          0          0   PCI-MSI 2621440-edge      enp5s0
138:     166775          0          6          2   PCI-MSI 2621441-edge      enp5s0-TxRx-0
139:     175020          8          0          0   PCI-MSI 2621442-edge      enp5s0-TxRx-1
140:     164371          0          6          2   PCI-MSI 2621443-edge      enp5s0-TxRx-2
141:     164499          0          6          2   PCI-MSI 2621444-edge      enp5s0-TxRx-3
Copy to clipboard

Best Practices for Achieving Real-time Performance

The following section is applicable to:

../../_images/target_generic3.png

Eliminate Sources of CPU Contention

To achieve real-time performance, it is imperative to isolate real-time workloads from other tasks. This can be achieved by using a real-time kernel and modifying the kernel boot parameters. ECI provides a Deb package named customizations-grub, which modifies the kernel boot parameters upon installation (see ECI Kernel Boot Optimizations). ECI targets core-bookworm and core-jammy have the customizations-grub package installed, by default. Refer to Install ECI Packages to learn how to install ECI Deb packages.

The ECI Deb package customizations-grub modifies the kernel boot parameters such that CPUs 1 & 3 (13th generation processors and older) or 2 & 4 (14th generation processors and newer) are isolated, and CPU 0 is reserved to handle Linux kernel interrupts (see ECI Kernel Boot Optimizations). This configuration allows the use of CPUs 1 and 3 for real-time workloads.

Important

This creates a side effect that any workloads which utilize CPU 0 will experience degraded performance. Therefore, it is recommended to move all critical processes to a CPU other than CPU 0.

For reference, the following snippet shows the default kernel boot parameters which affect CPUs:

nmi_watchdog=0
irqaffinity=0
isolcpus=1,3
rcu_nocbs=1,3
nohz_full=1,3
Copy to clipboard

See also

For a list of ECI kernel boot optimizations, refer to ECI Kernel Boot Optimizations.

For best performance, only assign a single isolated CPU per real-time workload. The following example executes the workload and the assigns the affinity of the workload to CPU 3 (where <workload> is replaced with the application to run):

$ taskset -c 3 <workload>
Copy to clipboard

To assign affinity of all the child tasks of a parent workload, run the following command (where <workload> is the name of your workload):

$ ps ww -eLo tid,comm,cmd | grep -i <workload> | awk '{print $1}' | xargs -n 1 taskset -pac 3 > /dev/null
Copy to clipboard

Prioritize Workloads

A simple and effective method to boost the performance of a real-time workload is to increase its runtime priority. Use the following command to run a workload with increased runtime priority (where <workload> is the name of your application):

$ chrt -f 1 <workload>
Copy to clipboard

To assign priority of all the child tasks of a parent workload, run the following (where <workload> is the name of your workload):

$ ps ww -eLo tid,comm,cmd | grep -i <workload> | awk '{print $1}' | xargs -n 1 chrt -p -f 1
Copy to clipboard

Use Cache Allocation Technology

Shared last-level caches are common on modern processors. For example, on an Intel® Core™ processor or Intel® Xeon® processor, the cores share an L3 cache. Whereas on an Intel Atom® processor, cores 0 and 1 share an L2 cache as well as cores 2 and 3. Due to this reality, workloads on adjacent cores can potentially be a cause of cache misses. This occurs when a workload evicts a cache line in use by another workload. When a cache miss occurs, the workload must wait while the memory is fetched. This introduces undesired jitter into the workload execution time, subsequently impacting determinism. To mitigate this issue, Intel Cache Allocation Technology provides a method to partition processor caches and assign these partitions to a Class-of-Service (COS). Associating workloads to different COS can effectively isolate parts of cache available to a workload, thus preventing cache contention altogether. See Cache Allocation Technology for more information on using CAT.

The ECI Deb package customizations-grub modifies the kernel boot parameters such that CPUs 1 & 3 (13th generation processors and older) or 2 & 4 (14th generation processors and newer) are isolated, and CPU 0 is reserved to handle Linux kernel interrupts (see ECI Kernel Boot Optimizations). Under these conditions, it is advantageous to allocate the CPU cache such that the Linux kernel tasks never share cache with any tasks running on the isolated cores. To achieve this result, perform the following steps:

Recommended CAT configuration for Intel Core™ or Xeon® processors

The following example sets core 0 L3 cache mask to 0x0f, and cores 1 and 3 L3 cache mask to 0xf0.

Attention

This example is best suited for Intel® Core™ or Intel® Xeon® processors, which share last-level L3 cache. For Intel Atom® processors, see the subsequent example.

  1. Reset cache allocation to default state.

    pqos -R
    
    Copy to clipboard
  2. Define the allocation classes for the last-level-class (LLC). Class 0 is allocated exclusive access to the first half of the LLC. Class 1 is allocated exclusive access to the second half of the LLC.

    pqos -e 'llc:0=0x0f;llc:1=0xf0'
    
    Copy to clipboard
  3. Associate core 0 with class 0, and cores 1 and 3 with class 1.

    pqos -a 'llc:0=0;llc:1=1,3'
    
    Copy to clipboard

Recommended CAT configuration for Intel Atom® processors

The following example sets core 0 and 2 L2 cache mask to 0x0f, and core 1 and 3 L2 cache mask to 0xf0.

Attention

This example is best suited for Intel Atom® processors, which share last-level L2 cache. For Intel® Core™ processors or Intel® Xeon® processors, see previous example.

$ pqos -R
$ pqos -e 'l2:0=0x0f;l2:1=0xf0'
$ pqos -a 'llc:0=0,2;llc:1=1,3'
Copy to clipboard

Stop Unnecessary Services

Many services run in the background, by default, on Linux. Stopping services may reduce spurious interrupts depending on the workload type. To list the loaded services, run the following command:

$ systemctl -t service
Copy to clipboard

To stop a service, run the following command (where <service> is the name a service):

Warning

Stopping system services can be detrimental to the stability of the Linux system. Be sure you understand the implications before stopping a service.

$ systemctl stop <service>
Copy to clipboard

Disable Machine Checks

By default, the Linux kernel periodically scans hardware for reported errors. While this feature can be useful for tracking down troublesome bugs, it also presents a source of workload preemption. Disabling this check improves real-time performance of workloads by preventing the Linux kernel from interrupting the running tasks. Run the following command to disable machine checks:

echo 0 > /sys/devices/system/machinecheck/machinecheck0/check_interval
Copy to clipboard

Increase Thread Runtime Limit

The default values for the real-time throttling mechanism define that 95% of the CPU time can be used by real-time tasks. The remaining 5% will be devoted to non-realtime tasks (tasks running under SCHED_OTHER and similar scheduling policies). It is important to note that if a single real-time task occupies that 95% CPU time slot, the remaining real-time tasks on that CPU will not run. The remaining 5% of CPU time is used only by non-realtime tasks.

The impact of the default values is two-fold: rogue real-time tasks will not lock up the system by not allowing non-realtime tasks to run and, on the other hand, real-time tasks will have at most 95% of CPU time available from them, probably affecting their performance.

If it is known that a particular workload is stable consuming 100% of a CPU, it is possible to configure the runtime limit to infinity.

Warning

Configuring the runtime limit in this way can potentially lock a system if the workload in question contains unbounded polling loops. Use this configuration with caution.

echo -1 > /proc/sys/kernel/sched_rt_runtime_us
Copy to clipboard

Typical Workload Optimization Flow

When executing a workload, complete the following steps to increase real-time performance of the workload:

  1. Stop unnecessary services. In this example we stop wireless communication related services:

    $ systemctl stop ofono
    $ systemctl stop wpa_supplicant
    $ systemctl stop bluetooth
    
    Copy to clipboard
  2. Stop Docker daemon (if containers not used):

    $ systemctl stop docker
    
    Copy to clipboard
  3. Disable kernel machine check interrupt:

    $ echo 0 > /sys/devices/system/machinecheck/machinecheck0/check_interval
    
    Copy to clipboard
  4. Disable thread runtime limit:

    $ echo -1 > /proc/sys/kernel/sched_rt_runtime_us
    
    Copy to clipboard
  5. Setup cache partitioning. See Cache Allocation Technology for information on using CAT. This example sets core 0 and 2 L2 cache mask to 0x0f, and core 1 and 3 L2 cache mask to 0xf0.

    $ pqos -R
    $ pqos -e 'l2:0=0x0f;l2:1=0xf0'
    $ pqos -a 'llc:0=0,2;llc:1=1,3'
    
    Copy to clipboard
  6. Assign all non-realtime task affinity to core 0. The following script iterates through all interrupts and attempts to assign affinity to core 0.

    #!/bin/bash
    for i in $(cat /proc/interrupts | grep '^ *[0-9]*[0-9]:' | awk {'print $1'} | sed 's/:$//');
    do
     # Timer
     if [ "$i" = "0" ]; then
         continue
     fi
     # cascade
     if [ "$i" = "2" ]; then
         continue
     fi
     echo setting $i to affine for core 0
     echo 1 > /proc/irq/$i/smp_affinity
    done
    
    Copy to clipboard
  7. Offload RCU tasks:

    $ for i in $(pgrep rcu); do taskset -pc 0 $i > /dev/null ; done
    
    Copy to clipboard
  8. Start the real-time workload (where <workload> is the name of your workload):

    $ ./<workload>
    
    Copy to clipboard
  9. Change affinity of workload tasks (where <workload> is the name of your workload). This example assigns affinity of all workload tasks to core 3.

    $ ps ww -eLo tid,comm,cmd | grep -i <workload> | awk '{print $1}' | xargs -n 1 taskset -pac 3 > /dev/null
    
    Copy to clipboard
  10. Change priority of workload tasks to be real-time (where <workload> is the name of your workload):

    $ ps ww -eLo tid,comm,cmd | grep -i <workload> | awk '{print $1}' | xargs -n 1 chrt -p -f 1
    
    Copy to clipboard
  11. Minimize integrated GPU utilization on the processor. Rather than connecting a display monitor to the target system, use SSH to access the system. This minimizes the interrupts generated by the integrated GPU, thus improving system determinism.