Real-Time Linux
The Embodied Intelligence SDK provides real-time capabilities to the kernel with PREEMPT_RT patch and boot parameters for real-time optimization, which aims to increase predictability and reduce scheduler latencies.
Installation
Install GRUB customizations
$ sudo apt install -y customizations-grub
Install linux-firmware
$ sudo apt install -y linux-firmware
Note: Linux 6.12 requires specify i915 GuC/dmc/gsc Linux firmware, these firmwares are installed to a distinct /lib/firmware/i915/experimental/
location. Please confirm below boot parameters in cat /proc/cmdline
after next reboot.
$ i915.guc_firmware_path=i915/experimental/mtl_guc_70.bin i915.dmc_firmware_path=i915/experimental/mtl_dmc.bin i915.gsc_firmware_path=i915/experimental/mtl_gsc_1.bin
If you cannot find i915 GuC/dmc/gsc Firmware in /lib/firmware/i915/experimental/
, you need to install latest linux-firmware
with below command:
$ sudo apt install -y linux-firmware=20220329.git681281e4-0ubuntu3.36-intel-iotg.eci8
You can double confirm to check if correct linux-firmware in use:
$ sudo apt-cache policy linux-firmware
The below result is expected:
linux-firmware:
Installed: 20220329.git681281e4-0ubuntu3.36-intel-iotg.eci8
Install the real-time Linux kernel, For more details, please refer to LinuxBSP
$ sudo apt install -y linux-intel-rt-experimental
Note: If you don’t need to use RT kernel, please follow with below command to install.
$ sudo apt install -y linux-intel-experimental
To modify default boot parameters, please edit in
/etc/grub.d/10_eci_experimental
.
Note: Follow with below steps and modify eci_cmdline_exp
in /etc/grub.d/10_eci_experimental
for better real-time performance and power consumption.
# Modify default cmdline parameters to enable cstate/pstate
$ sudo sed -i 's/intel_pstate=disable intel.max_cstate=0 intel_idle.max_cstate=0 processor.max_cstate=0 processor_idle.max_cstate=0/intel_pstate=enable/g' /etc/grub.d/10_eci_experimental
# Modify default cmdline parameter to affinity irq to core 0-9
$ sudo sed -i 's/irqaffinity=0 /irqaffinity=0-9 /g' /etc/grub.d/10_eci_experimental
# Modify default cmdline parameter to isolate cpus to core 10-13
$ sudo sed -i 's/isolcpus=${isolcpus} rcu_nocbs=${isolcpus} nohz_full=${isolcpus}/isolcpus=10-13 rcu_nocbs=10-13 nohz_full=10-13/g' /etc/grub.d/10_eci_experimental
$ sudo update-grub
The following command line parameters are used for real-time optimization. You can modify them according to your requirements:
isolcpus
: Isolates specified CPU cores from the generic scheduler, dedicating them to real-time tasks.rcu_nocbs
: Prevents specified CPU cores from handling RCU (Real-Copy-Update) callback, reducing latency.nohz_full
: Enables full dynamic ticks on specified CPU cores, reducing timer interrupts.irqaffinity
: Directs all hardware interrupts to specified CPU cores, keeping them free for real-time tasks.
Ensure to select [Experimental] ECI Ubuntu booting after reboot.

Note: Select Advanced Options for [Experimental] ECI Ubuntu
to list [Experimental] ECI Ubuntu, with Linux 6.12.8-intel-ese-experimental-lts-rt
for Real-time kernel or [Experimental] ECI Ubuntu, with Linux 6.12.8-intel-ese-experimental-lts
for generic kernel

Real-time Runtime Optimization
To achieve real-time performance on a target system, specific runtime configurations and optimizations are recommended. This section provides a foundation for enabling real-time capable workloads.

Use Cache Allocation Technology
Intel® Cache Allocation Technology (CAT) enables partitioning of caches at various levels within the caching hierarchy. providing a straightforward method to enhance temporal isolation between real-time and best-effort workloads.
This is an example configuration should be tailored to your specific use case and processor. To determine cache topology, including size and number of ways supported by a processor, use the CPUID leaf “Deterministic Cache Parameters Leaf - 0x4”. Linux utilities link lstopo
are also useful for obtaining an overview of a processor’s cache topology.
For more information about CAT, refer to the following resources:
Public Intel® Time Coordinated Computing (TCC) User Guide - RDC #[831067]
Intel® Resource Director Technology (Intel® RDT) Architecture Specification - RDC #[789566]
Intel® 64 and IA-32 Architectures Software Developer’s Manual - RDC#[671200]
Below is an example script to partition the Last Level Cache (LLC) and L2 Cache, assigning an exclusive portion to real-time tasks. Ensure you have installed the Linux msr-tools
to test it according to your configuration:
(e.g. core 13 as isolate core)
# ! /bin/sh # define LLC Core Masks wrmsr 0xc90 0x3f # best effort mask wrmsr 0xc91 0xfc0 # real-time mask # define E-core L2 Core Mask wrmsr -p10 0xd10 0xff # best effort mask wrmsr -p11 0xd10 0xff # best effort mask wrmsr -p12 0xd10 0xff # best effort mask wrmsr -p13 0xd11 0xff00 # real-time mask # assign the masks to the cores # This has to match with the core selected for the real-time task wrmsr -p13 0xc8f 0x100000000
Use Dynamic Voltage and Frequency
Dynamic Voltage and Frequency Scaling (DVFS) features, such as Intel® Speed Step, Speed Shift, and Turbo Boost Technology, allow processors to adjust voltage and frequency within P-States to balance power efficiency and performance. Speed Step and Speed Shift manage these adjustments, while Turbo Boost temporarily exceeds the highest P-State for additional performance during demanding task.
To enhance single-thread performance, boost the frequency of the real-time core within the turbo frequency range. For real-time requirements, you can lock the core frequency during runtime using HWP MSRs or the intel_pstate
driver in Linux. Locking the core frequency of the real-time application to a turbo frequency and limiting the maximum frequency of best-effort (BE) cores to the base frequency, as guided by the TCC User Guide, results in reduced execution time jitter and significantly lower execution time.
Boost the frequency of the real-time core to a value within the turbo frequency range to leverage higher single-thread performance. As real-time requirements, you have the option to lock core frequency during runtime using the HWP MSRs or the intel_pstate
driver under Linux.
For more information on accessing HWP MSRs directly instead of using the sysfs
entries of the intel_pstate
driver, refer to the [TCC User Guide] and the Intel® 64 and the Intel® 64 and IA-32 Architectures Software Developer’s Manual Vol3 section “Power and Thermal Management-Hardware Controlled Performance States - RDC #[671200].
Attention
Setting even just a few cores to a higher, fixed frequency does not come without a cost. Due to higher internal frequency, voltages, and subsequent higher temperature and power, such settings will negatively impact the reliability expectations of the CPU and should be used with careful consideration.
Below is an example to boost the real-time core to 3GHz, with the Energy Performance Preference (EPP) set to performance to ensure Quality of Service (QoS) in case of power limit throttling:
(e.g. core 13 as isolate core on Intel® Core™ Ultra Processors 255H)
(Option 1): Using the
sysfs
entries of theintel_pstate
driver
# ! /bin/sh # Set the min and max frequencies to specific turbo frequency echo performance > /sys/devices/system/cpu/cpu13/cpufreq/scaling_governor echo 3000000 > /sys/devices/system/cpu/cpu13/cpufreq/scaling_max_freq echo 3000000 > /sys/devices/system/cpu/cpu13/cpufreq/scaling_min_freq
(Option 2): Using
msr-tools
to modifyIA32_HWP_REQUEST(0x774)
for setting specific core frequency.
Note: For details on IA32_HWP_REQUEST
, please refer to the Intel® 64 and the Intel® 64 and IA-32 Architectures Software Developer’s Manual Vol3 section “Power and Thermal Management-Hardware Controlled Performance States - RDC #[671200].
# ! /bin/sh
wrmsr 0x774 -p 0 0x80005201
wrmsr 0x774 -p 1 0x80005201
wrmsr 0x774 -p 2 0x80005201
wrmsr 0x774 -p 3 0x80005201
wrmsr 0x774 -p 4 0x80005201
wrmsr 0x774 -p 5 0x80005201
wrmsr 0x774 -p 6 0x80003e01
wrmsr 0x774 -p 7 0x80003e01
wrmsr 0x774 -p 8 0x80003e01
wrmsr 0x774 -p 9 0x80003e01
wrmsr 0x774 -p 10 0x80003e01
wrmsr 0x774 -p 11 0x80003e01
wrmsr 0x774 -p 12 0x80003e01
wrmsr 0x774 -p 13 0x00002a2a
Per-core C-State Disable
Refer to OS Setup for BIOS optimization and Linux boot parameter optimization on Real-time performance, Intel C-state and P-state are enabled. It brings more power consumption to improve on GPU AI performance, but C-state can introduce jitter due to the varying times required to transition between states in isolate cores. Per-core C-state Disable helps minimize this jitter, providing a more stable environment for real-time task.
Follow with below command to disable C-state in isolate core:
(e.g. core 13 as isolate core)
# ! /bin/sh
# Disable all cstates except C0 in isolate CPU cores
# Define the range for CPU indices
cpu_start=13 # Replace with your starting CPU index
cpu_end=13 # Replace with your ending CPU index
# Loop over each CPU index
for (( i=cpu_start; i<=cpu_end; i++ )); do
# Determine the maximum state index for the current CPU
max_state_index=$(ls /sys/devices/system/cpu/cpu$i/cpuidle/ | grep -o 'state[0-9]*' | sed 's/state//' | sort -n | tail -1)
# Loop over each state index
for (( j=1; j<=max_state_index; j++ )); do
# Disable the current state
sudo echo 1 > /sys/devices/system/cpu/cpu$i/cpuidle/state$j/disable
echo "Disabled CPU $i state $j"
done
done
Timer Migration Disable
In Linux kernel, timer migration refers to the process of moving timers from one CPU to another. This is often done to balance the load across CPUs or to optimize power management by consolidating timers on fewer CPUs when others are idle. Timer migration can lead to interference with other tasks running on the target CPU, potentially affecting real-time performance in isolate CPU core. By keeping timers on their original CPU, you minimize the risk of such interference.
Disabling timer migration in a real-time kernel helps maintain the consistency and predictability required for real-time applications, ensuring that timers are executed with minimal latency and interference.
Timer migration can be disabled with the following command:
$ echo 0 > /proc/sys/kernel/timer_migration
Disable Swap
Accessing anonymous memory that has been swapped to disk results in a major page fault. Handling page faults can further increase latency and unpredictability, which is undesirable in real-time tasks. Swap can be disabled with following command:
$ swapoff -a
Verify Benchmark Performance
After installing the real-time Linux kernel, it’s a good idea to benchmark the system to establish confidence that the system is properly configured. Perform either of the following commands to install Cyclictest. Cyclictest is most commonly used for benchmarking real-time systems. It is one of the most frequently used tools for evaluating the relative performance of an RT. Cyclictest accurately and repeatedly measures the difference between a thread’s intended wake-up time and the time at which it actually wakes up to provide statistics about the system’s latency. It can measure latency in real-time systems caused by the hardware, the firmware, and the operating system.
Please use rt-tests v2.6
to collect performance, which support to pin threads to specific isolate core and avoid main thread in same core with the measurement threads.
Follow with below steps, you can find cyclictest v2.6
in rt-tests-2.6
:
$ wget https://web.git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git/snapshot/rt-tests-2.6.tar.gz
$ tar zxvf rt-tests-2.6.tar.gz
$ cd rt-tests-2.6
$ make
Note: Please ensure you had installed libnuma-dev
as dependence before compilation.
$ sudo apt install libnuma-dev
An example command that runs the cyclictest benchmark as below:
$ cyclictest -mp 99 -t1 -a 13 -i 1000 --laptop -D 72h -N --mainaffinity 12
Default parameters are used unless otherwise specified. Run cyclictest --help
to list the modifiable arguments.
option |
Explanation |
---|---|
-p |
priority of highest priority thread |
-t |
one thread per available processor |
-a |
Run thread #N on processor #N, or if CPUSET given, pin threads to that set of processors in round-robin order |
-i |
base interval of thread in us default=1000 |
-D |
specify a length for the test run |
-N |
print results in ns instead of us(default us) |
–mainaffinity |
Run the main thread on CPU #N. This only affects the main thread and not the measurement threads |
-m |
lock current and future memory allocations |
–laptop |
Not setting |
On a realtime-enabled system, the result might be similar to the following:
T: 0 ( 3407) P:99 I:1000 C: 100000 Min: 928 Act: 1376 Avg: 1154 Max: 18373
This result indicates an apparent short-term worst-case latency of 18 us. According to this, it is important to pay attention to the Max values as these are indicators of outliers. Even if the system has decent Avg (average) values, a single outlier as indicated by Max is enough to break or disturb a real-time system.
If the real-time data is not good by default installation, please refer to OS Setup for BIOS optimization and Optimize Performance to optimize Linux OS and application runtime on Intel® Processors.