Attention

You are viewing an older version of the documentation. The latest version is v3.3.

System Performance Characterization¶

The following section is applicable to:

Mega Benchmark¶

Benchmark	Units	Version	Source
Mega Benchmark	(See respective benchmark)	1.5	Intel Created

The ECI Mega Benchmark exercises the following tests:

EC Protocol Bridge UADP UDP Transport
EC Protocol Bridge UADP L2 Transport without priority
EC Protocol Bridge UADP L2 Transport with priority
Rhealstone xLatency
Rhealstone Stress Test
Caterpillar
Jitter
MSI Jitter
MMIO Latency
Cyclictest Workload
LMbench

To start the benchmark, run the following command:

$ /opt/benchmarking/mega_benchmark.sh

Results are logged to a file named: results_mega_benchmark_<date>

Cyclictest Workload¶

Benchmark	Units	Version	Source
Cyclictest	microseconds	1.50	https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git/snapshot/rt-tests-1.5.tar.gz

Cyclictest is most commonly used for benchmarking real-time (RT) systems. It is one of the most frequently used tools for evaluating the relative performance of an RT. Cyclictest accurately and repeatedly measures the difference between a thread’s intended wake-up time and the time at which it actually wakes up to provide statistics about the system’s latency. It can measure latency in real-time systems caused by the hardware, the firmware, and the operating system.

An example script that runs the cyclictest benchmark and the README is available at /opt/benchmarking/rt-tests. The script performs the following runtime optimizations before executing the benchmark:

Uses Cache allocation technology (CAT) to allocate exclusive access of half the last-level cache to core 3
Assigns benchmark thread affinity to core 3
Assigns non-benchmark thread affinity to core 0
Changes the priority of benchmark thread to -20 (using: nice -n -20)
Disables kernel machine check interrupt
Increases the thread runtime utilization to infinity

To start the benchmark, run the following command:

$ /opt/benchmarking/rt-tests/start-cyclic.py

Default parameters are used unless otherwise specified. Run the script with --help to list the modifiable arguments.

See also

Sanity-Check #0 - Cyclictest Workload provides an example on how to run this benchmark and display the results.

Interpreting Cyclictest Results¶

Short	Explanation
T	Thread: Thread index and thread ID
P	Priority: RT thread priority
I	Interval: Intended wake up period for the latency measuring threads
C	Count: Number of times the latency was measured that is, iteration count
Min	Minimum: Minimum latency that was measured
Act	Actual: Latency measured during the latest completed iteration
Avg	Average: Average latency that was measured
Max	Maximum: Maximum latency that was measured

On a non-realtime system, the result might be similar to the following:

T: 0 ( 3431) P:99 I:1000 C: 100000 Min:      5 Act:   10 Avg:   14 Max:   39242
T: 1 ( 3432) P:98 I:1500 C:  66934 Min:      4 Act:   10 Avg:   17 Max:   39661

The right-most column contains the most important result, that is, the worst-case latency of 39.242 ms (Max value).

On a realtime-enabled system, the result might be similar to the following:

T: 0 ( 3407) P:99 I:1000 C: 100000 Min:      7 Act:   10 Avg:   10 Max:      18
T: 1 ( 3408) P:98 I:1500 C:  67043 Min:      7 Act:    8 Avg:   10 Max:      22

This result indicates an apparent short-term worst-case latency of 18 ms. According to this, it is important to pay attention to the Max values as these are indicators for outliers. Even if the system has decent Avg (average) values, a single outlier as indicated by Max is enough to break or disturb a real-time system.

According to the README from the https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git repository:

Running cyclictest only over a short period of time and without creating appropriate real-time stress conditions is rather meaningless, since the execution of an asynchronous event from idle state is normally always quite fast, and every - even non-RT system - can do that. The challenge is to minimize the latency when reacting to an asynchronuous event, irrespective of what code path is executed at the time when the external event arrives. Therefore, specific stress conditions must be present while cyclictest is running to reliably determine the worst-case latency of a given system.

Additional Cyclictest Workload¶

Another script, rt_bmark.py, performs stress workloads when executing the benchmark cyclictest.

To start the benchmark, run the following command:

$ /opt/benchmarking/rt-tests/rt_bmark.py

Jitter¶

Benchmark	Units	Version	Source
Jitter	CPU Cycles	1.9	Intel Created

The Jitter benchmark measures the execution time variation of a CPU test workload. The performance of the workload is impacted by kernel interrupts. Minimizing these interrupts also minimizes the jitter that applications could potentially experience.

An example script that runs the Jitter benchmark is available at /opt/benchmarking/jitter/jitter. The script performs the following runtime optimizations before executing the benchmark:

Uses CAT to allocate exclusive access of half the last-level cache to core 3
Assigns benchmark thread affinity to core 3
Assigns non-benchmark thread affinity to core 0
Changes the priority of benchmark thread to -20 (using: nice -n -20)
Disables kernel machine check interrupt
Increases the thread runtime utilization to infinity

To start the benchmark, run the following command:

$ /opt/benchmarking/jitter/start-benchmark.py

Default parameters are used unless otherwise specified. Run the script with --help to list the modifiable arguments.

See also

Sanity-Check #2 - Jitter Workload provides an example on how to run this benchmark and display the results.

Interpreting Jitter Results¶

Inst_Min   Inst_Max   Inst_jitter last_Exec  Abs_min    Abs_max      tmp       Interval     Sample No
 177205     235598      58393     177219     164702     243978    1227096064 3604948177        200      66777

The most important measurement is Inst_jitter. This measurement describes the execution time jitter during the display update interval. It is desired that Inst_jitter be as low as possible. The delta between Abs_max and Abs_min is the overall jitter spread. Ideally, the spread must be as low as possible.

LMbench¶

Benchmark	Units	Version	Source
LMbench	Nanosecond	3.0a9	https://sourceforge.net/projects/lmbench/files/development/lmbench-3.0-a9/lmbench-3.0-a9.tgz/download

LMbench is a suite of simple, portable, ANSI/C microbenchmarks for UNIX/POSIX. In general, it measures two key features: latency and bandwidth. LMbench is intended to provide system developers an insight into basic costs of key operations.

lat_mem_rd measures memory read latency for varying memory sizes and strides. The results are reported in nanoseconds per load and have been verified to be accurate to within a few nanoseconds on an SGI Indy. The entire memory hierarchy is measured, including onboard cache latency and size, external cache latency and size, main memory latency, and TLB miss latency. Only data accesses are measured; the instruction cache is not measured.

The specific cache and memory latency micro-benchmark is available at /usr/bin/lat_mem_rd.

To start this micro-benchmark, run the following command:

$ cd ~/;taskset -c $CORE_AFFINITY /usr/bin/lat_mem_rd -P 1 $MEMRD_SIZE $MEMRD_CHUNCKS &> /tmp/result_lat_mem_rd.txt

Default parameters are used unless otherwise specified (MEMRD_SIZE and MEMRD_CHUNCKS variables). Run the script with --help to list the modifiable arguments.

See also

Sanity-Check #1 - LMbench Workload provides an example on how to run this benchmark and display the results.

Interpreting LMbench Results¶

The output is best examined graphically. You get a graph with four plateaus. The graph should plotted in log base 2 of the array size on the X axis and the latency on the Y axis. Each stride is then plotted as a curve. The plateaus that appear correspond to the onboard cache (if present), external cache (if present, main memory latency, and TLB miss latency.

Caterpillar¶

Benchmark	Units	Version	Source
Caterpillar	CPU Cycles	1.1	Intel Created

The Caterpillar benchmark measures the execution time variation of a memory test workload. The performance of the workload is impacted by cache misses. Using CAT improves application performance by assigning CPU affinity to cache ways, which can be dedicated to real-time applications.

An example script running the Caterpillar benchmark is available at /opt/benchmarking/caterpillar. The script performs the following runtime optimizations before executing the benchmark:

CAT to allocate exclusive access of half the last-level cache to core 3
Assign benchmark thread affinity to core 3
Assign non-benchmark thread affinity to core 0
Change priority of benchmark thread to -20 (using: nice -n -20)
Disable kernel machine check interrupt
Increase thread runtime utilization to infinity

To start the benchmark, run the following command:

$ /opt/benchmarking/caterpillar/start-benchmark.py

Default parameters are used unless otherwise specified. Run the script with --help to list the modifiable arguments.

See also

Sanity-Check #3 - Caterpillar Workload provides an example on how to run this benchmark and display the results.

Interpreting Caterpillar Results¶

SampleMin  SampleMax   SmplJitter  SessionMin SessionMax SessionJitter  Sample
  254023     300255       9649       233516     303071       9743        200

The most important measurement is SessionJitter. This measurement describes the maximum execution time jitter during the entire execution of the benchmark. It is desired that SessionJitter be as low as possible. The delta between SessionMax and SessionMin is the overall execution time spread. Ideally, the spread must be as low as possible.

RT-app¶

Benchmark	Units	Version	Source
rt-app	N/A	`1.0-109-g78ae2cf`	https://github.com/scheduler-tools/rt-app

RT-app is a tool that can be used to emulate a use case. Not only the sleep and run pattern can be emulated, but also the dependency between tasks like accessing same critical resources, creating sequential wake up, or syncing the wake up of threads. The use case is described in a json like file, which is first parsed by workgen and then rt-app.

An example script that runs the RT-app tool is available at /opt/benchmarking/rt-app.

Rhealstone¶

Benchmark	Units	Version	Source
Rhealstone	nanoseconds	commitID c9bdf48a	https://gitlab.denx.de/Xenomai/xenomai/tree/master/testsuite/latency

Rhealstone is a measurement targeted specifically toward true multitasking solutions. In this benchmark, five categories of activities crucial to the performance of real-time systems are represented:

Task switching time
Preemption time
Semaphore shuffling time
Interrupt latency time
Deadlock breaking time

Two example scripts that run the Rhealstone benchmark are available at /opt/benchmarking/rhealstone. The script run_xlatency.py performs the following runtime optimizations before executing the benchmark latency:

Assign benchmark thread affinity to core 0
Assign non-benchmark thread affinity to core 1, as stress workloads.
Dump histogram to <file> in a format easily readable with gnuplot
Print statistics of minimum, average, and maximum latencies
Sampling period set as 250 us

To start the benchmark, run the following script:

//without stress
$ cd /opt/benchmarking/rhealstone
$ ./run_xlatency.py -T <runtime>

//with stress
$ cd /opt/benchmarking/rhealstone
$ ./run_xlatency.py -T <runtime> --stress

//with stress exclude gfx for image without glxgears
$ cd /opt/benchmarking/rhealstone
$ ./run_xlatency.py -T <runtime> --stress --no-gfx

Command line parameters:

runtime:        Set test period in seconds.

Default parameters are used unless otherwise specified. Run the script with --help to list the modifiable arguments. Otherwise, run the latency benchmark directly with --help to list the modifiable arguments.

See also

Sanity-Check #4 - Latency Workload provides an example on how to run this benchmark and display the results.

The script run_rhealstone_bmark.py executes benchmarks ctx_lat, deadlock_bt, preempt_lat, and semaphore_lat separately for <num> times and gets the average result for each.

To start the benchmark, run the following command:

$ cd /opt/benchmarking/rhealstone
$ ./run_rhealstone_bmark.py <num>

See also

Sanity-Check #7 - Rhealstone Workload provides an example on how to run this benchmark and display the results.

Interpreting Rhealstone Results¶

The results obtained from run_xlatency.py are measured interrupt latency. Lower values are better.

...
#Avg: 0.683 us
#Max: 4.656 us
#Max list: [4.656]
...

The results obtained from run_rhealstone_bmark.py are measured time to perform: task switching, preemption, semaphore shuffling, and deadlock breaking. Lower values are better.

// Task Switching
#ctx_sum is 84878.48
#ctx_avg is 848.78

// Deadlock Breaking
#dead_sum is 225127.860003
#dead_avg is 2251.278600

// Preemption
#pree_sum is 114065.000000
#pree_avg is 1140.650000

// Semaphore Shuffling
#sem_sum is 208111.365000
#sem_avg is 2081.113650

MSI Latency¶

Benchmark	Units	Version	Source
MSI Latency	nanoseconds	0.5.0-k	Intel Created

Message Signaled Interrupt (MSI) Latency measures interrupt latency for MSIs generated by a discrete or integrated peripheral device, such as the Intel® I210 Ethernet Controller Series and serviced by the IA core. This benchmark measures the MSI interrupt triggered back latency of Intel® I210 Ethernet Controller Series kernel module.

Attention

This benchmark only functions in tandem with an Intel® I210 Ethernet Controller Series.

An example script that runs the MSI latency benchmark is available at /opt/benchmarking/msi-latency. The script performs the following runtime optimizations before executing the benchmark:

Move all IRQs (except timer and cascade) to core 0
Move all rcu tasks to either core 0, 4, 5, 6, 7, except 1-3
Change the real-time attributes of all rcu tasks to SCHED_OTHER and priority 0
Change the real-time attributes of all tasks on core 1, 2 and 3 to SCHED_OTHER and priority 0
Change the real-time attributes of all tasks to SCHED_OTHER and priority 0

Required kernel options:

isolcpus=1-3 rcu_nocbs=1-3 nohz_full=1-3 igb.blacklist=yes

To start the benchmark, run the following commands:

# cd /opt/benchmarking/msi-latency

// for preempt-rt kernel
# ./msiLatencyTest.sh <IRQ_PERIOD(ms)> <IRQ_COUNT>
//get current test result
# cat /sys/kernel/debug/msi_latency_test/current_value

// for xenomai kernel
// find Intel I210 Ethernet Controller full pci id (igbid) on your test device, by `lspci -v | grep 'Ethernet controller: Intel Corporation I210 Gigabit Network Connection'`. For example: '0000:02:00.0'
# ./run_cobalt_msi_lat.sh <unbind igbid> <IRQ_PERIOD (ms)> <runtime (s)>
//get current test result
# cat /sys/kernel/debug/msi_latency_cobalt_test/current_value

Default parameters are used when insert msi_lat.ko, unless otherwise specified. Command line parameters:

``coreSpecIRQ``:    Core on which the msi_latency test IRQ handler is run. [0 to n]
``coreSpecWQ``:     Core on which the msi_latency test work queue is run. [0 to n]
``irqSpec``:        Run IRQ as a pthread [0] or in legacy mode [1].
``irqPeriod``:      Period between interrupts in milliseconds.
``irqCount``:       Number of IRQs to send.  n > 0 will be finite.  n < 0 will be infinite.
``verbosity``:      Verbosity of periodic prints to dmesg. [0 to 3]
``offsetStart``:    Offset that is subtracted from each starting timestamp when calculating latency and generating
                histogram buckets (does not effect raw values). Mechanism used to remove delta time (nanosecond)
                from when SW requests interrupt to HW services interrupt.
``blockIRQ``:       Duration (clock ticks) for which the interrupts are blocked after requesting MSI.

See also

Sanity-Check #5 - MSI Latency provides an example on how to run this benchmark and display the results..

Interpreting MSI latency Results¶

************ RESULTS (ns) ************

[15996.755567] * Max: 9056
[15996.755570] * Avg: 6626
[15996.755573] * Min: 5360

The reported value is the measured time to service MSI requests to the Intel® I210 Ethernet Controller Series. Lower values are better.

MSI Jitter¶

Benchmark	Units	Version	Source
MSI Jitter	nanoseconds	0.1.0-m	Intel Created

MSI jitter benchmark tool is a benchmark tool to check the jitter of cyclical MSI from an Intel® I210 Ethernet Controller Series. This tool records each cyclical MSI timestamps in the software interrupt handler and calculates the delta between each timestamps and compares the delta with cycle time as the jitter.

Attention

This benchmark only functions in tandem with an Intel® I210 Ethernet Controller Series.

An example script that runs the MSI latency benchmark is available at /opt/benchmarking/msi-jitter. The script performs the following runtime optimizations before executing the benchmark:

Moves all IRQs (except timer and cascade) to core 0
Moves all rcu tasks to either core 0, 4, 5, 6, 7, except 1-3

$ /opt/benchmarking/msi-jitter/irq_rcu.sh

To start the benchmark, run the following commands:

$ cd /opt/benchmarking/msi-jitter
$ ./run_msijitter.sh <unbind_igb_id> <run_core> <interval(ms)> <runtime(s)>
//get current test result
$ cat /sys/kernel/debug/msi_jitter_test/current_value

Command line parameters:

unbind_igb_id:  Could get igb id need to unbind by `$lspci -v | grep 'Ethernet controller: Intel Corporation I210 Gigabit Network Connection'`.
run_core:       Set which core will handle MSI interrupt.
interval:       Set cycle time.
runtime:        Set test period.

See also

Sanity-Check #6 - MSI Jitter provides an example on how to run this benchmark and display the results.

Interpreting MSI Jitter Results¶

 ************ RESULTS (ns) ************

[ 4177.021407] * Max: 3184
[ 4177.023363] * Avg: 222
[ 4177.025218] * Min: 0

The reported value is the measured jitter of servicing MSI requests to the Intel® I210 Ethernet Controller Series. Lower values are better.

MMIO Latency¶

Benchmark	Units	Version	Source
MMIO Latency	nanoseconds	1.0	Intel Created

MMIO-latency is a simple driver that creates an affinitized thread to read a virtual map physical address (Memory-mapped I/O). Memory read latency is measured. The thread is created and started on init and loops LOOPS number of times. It also provides a char device that reads the current statistic counters, by using inline assembly and kernel function to get a very close benchmark.

An example script that runs the MSI latency benchmark is available at /opt/benchmarking/mmio-latency. The script performs the following runtime optimizations before executing the benchmark:

Moves all IRQs (except timer and cascade) to core 0
Moves all rcu task to either core 0, 4, 5, 6, 7, except 1-3
Changes realtime attributes of all rcu tasks to SCHED_OTHER and priority 0
Changes realtime attributes of all tasks on core 1, 2 and 3 to SCHED_OTHER and priority 0
Changes realtime attributes of all tasks to SCHED_OTHER and priority 0

To start the benchmark, run the following command:

$ /opt/benchmarking/mmio-latency/mmioLatency.sh <MMIO_address> <block_irq(1 or 0)>

Default parameters are used unless otherwise specified.

Find the physical mmio address to test using the command: lspci -vvv -s $BDF.

For example:

$ lspci -nn ==> 00:02.0 SATA controller
$ lspci -vvv -s 00:02.0 ==> Region 0: Memory at 80002000

See also

Sanity-Check #8 - MMIO Latency Workload provides an example on how to run this benchmark and displays the results.

Interpreting MMIO Latency Results¶

RT Core Module
------------------------
 Stats:
  max= 6472
  avg= 2814
  min= 588
  total= 973791140
  loops= 346010

 mmio-outliers: 247
 sched-outliers: 9

The reported value is the latency in reading memory-mapped I/O. Lower values are better.

Real-Time Performance Measurement (RTPM)¶

Benchmark	Units	Version	Source
RTPM	(See respective benchmark)	1.6	Intel Created

About RTPM¶

Real-time computing is critical to industrial usage scenarios. Intel real-time solutions focus on hard real-time use cases where there could be capability failures if the solutions are not executed within the allotted time span. Examples of the usage scenarios include robotics, automotive, and so on.

In a real-time computing system, several factors could impact the latency of reaction to the trigger event. These factors include hardware design, BIOS configuration, OS kernel configuration, system settings, and so on. RTPM is designed to check the key settings of the system and help you to identify the hotspot of the system for real-time performance and provide a recommendation based on the Best-known Configuration (BKC). In addition, RTPM provides a way to measure the system scheduling latency with some open source tools.

RTPM Test Modules¶

Real-Time Readiness Check¶

This module leverages one of the Intel® Time Coordinated Computing (Intel® TCC) Tools to check the many attributes that might affect real-time performance.

This module:

Verifies whether the system has a supported processor, BIOS, and OS
Checks for features, such as Intel® Turbo Boost Technology, Enhanced Intel SpeedStep® Technology, and processor power-saving states, that might affect real-time performance
Reports CPU and GPU frequencies
Operates at the OS level

Boot Command Line Check¶

This module checks the real-time OS boot command line parameters and recommends settings as per BKCs.

Interpreting Boot Command Line Check Results

=============================BOOT CMDLINE CHECK Mon Dec  1 00:00:00 UTC 2021=============================
----------------------------------------------------------------------------------------------------------------------------------------------------------
|CMDLINE ENTRY                                     |CURRENT VALUE                                     |BKC                                               |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|processor.max_cstate                              |0                                                 |0                                                 |
|intel_idle.max_cstate                             |0                                                 |0                                                 |
|clocksource                                       |tsc                                               |tsc                                               |
|tsc                                               |reliable                                          |reliable                                          |
|nmi_watchdog                                      |0                                                 |0                                                 |
|nosoftlockup                                      |nosoftlockup                                      |nosoftlockup                                      |
|intel_pstate                                      |disable                                           |disable                                           |
|efi                                               |runtime                                           |runtime                                           |
|nohalt                                            |Missing                                           |nohalt                                            |
|nohz                                              |Missing                                           |nohz                                              |
|irqaffinity                                       |0                                                 |0                                                 |
|hugepages                                         |Missing                                           |1024                                              |
|cpufreq.off                                       |Missing                                           |1                                                 |
|i915.enable_rc6                                   |Missing                                           |0                                                 |
|i915.enable_dc                                    |Missing                                           |0                                                 |
|i915.disable_power_well                           |Missing                                           |0                                                 |
|mce                                               |Missing                                           |off                                               |
|hpet                                              |Missing                                           |disable                                           |
|numa_balancing                                    |Missing                                           |disable                                           |
|nohz_full                                         |1,3                                               |[xxxx]                                            |
|isolcpus                                          |1,3                                               |[xxxx]                                            |
|rcu_nocbs                                         |1,3                                               |[xxxx]                                            |
----------------------------------------------------------------------------------------------------------------------------------------------------------
Boot cmdline check finished.

Kernel Configuration Check¶

This module checks the real-time OS kernel configuration and recommends settings as per BKCs.

Interpreting Kernel Configuration Check Results

=============================KERNEL CONFIGURATION CHECK Mon Dec  1 00:00:01 UTC 2021=============================
Kernel config file: /boot/config-5.4.115-rt57-intel-pk-standard+
----------------------------------------------------------------------------------------------------------------------------------------------------------
|KERNEL CONFIG ENTRY                               |CURRENT VALUE                                     |BKC                                               |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|CONFIG_SMP                                        |CONFIG_SMP=y                                      |CONFIG_SMP=y                                      |
|CONFIG_PREEMPT_RCU                                |CONFIG_PREEMPT_RCU=y                              |CONFIG_PREEMPT_RCU=y                              |
|CONFIG_GENERIC_IRQ_MIGRATION                      |CONFIG_GENERIC_IRQ_MIGRATION=y                    |CONFIG_GENERIC_IRQ_MIGRATION=y                    |
|CONFIG_EXPERT                                     |CONFIG_EXPERT=y                                   |CONFIG_EXPERT=y                                   |
|CONFIG_PCIE_PTM                                   |CONFIG_PCIE_PTM=y                                 |CONFIG_PCIE_PTM=y                                 |
|CONFIG_EFI                                        |CONFIG_EFI=y                                      |CONFIG_EFI=y                                      |
|CONFIG_HIGH_RES_TIMERS                            |CONFIG_HIGH_RES_TIMERS=y                          |CONFIG_HIGH_RES_TIMERS=y                          |
|CONFIG_RCU_NOCB_CPU                               |CONFIG_RCU_NOCB_CPU=y                             |CONFIG_RCU_NOCB_CPU=y                             |
|CONFIG_HUGETLBFS                                  |CONFIG_HUGETLBFS=y                                |CONFIG_HUGETLBFS=y                                |
|CONFIG_SCHED_MC_PRIO                              |Missing                                           |CONFIG_SCHED_MC_PRIO=n                            |
|CONFIG_PREEMPT_RT                                 |# CONFIG_PREEMPT_RT is not set                    |CONFIG_PREEMPT_RT=y                               |
|CONFIG_CPU_FREQ                                   |# CONFIG_CPU_FREQ is not set                      |CONFIG_CPU_FREQ=n                                 |
|CONFIG_CPU_ISOLATION                              |CONFIG_CPU_ISOLATION=y                            |CONFIG_CPU_ISOLATION=y                            |
|CONFIG_MIGRATION                                  |CONFIG_MIGRATION=y                                |CONFIG_MIGRATION=y                                |
|CONFIG_PCIEPORTBUS                                |CONFIG_PCIEPORTBUS=y                              |CONFIG_PCIEPORTBUS=y                              |
----------------------------------------------------------------------------------------------------------------------------------------------------------
Kernel configuration check finished.

Real-Time Performance Test¶

This module contains the following benchmarks to evaluate the performance of the target system:

Xenomai Latency Test
Cyclictest Workload
Rhealstone
MSI Latency
MSI Jitter
MMIO Latency

Interpreting Real-Time Performance Test Results

=============================Real-Time PERFORMANCE TEST Mon Dec  1 00:00:02 UTC 2021=============================
Real-time task CPU affinity is set to core 1 by default. Stress is added to other CPU cores if enabled. It's recommended to isolate the core 1 for real-time task from others.
To change the default test duration. Please modify test_cfg file.
>>> Processing cyclictest...Test duration: 60s
Setup cyclictest test on core 1
Max CPUs = 4
# /dev/cpu_dma_latency set to 0us
Thread 0 Interval: 750
Thread 0 using cpu 1.
      0:       0:   11283
      0:    4622:    8008
      0:    9850:    8163
      ....
      0:  224591:    8132
      0:  226642:    7752
      0:  234162:    7799
      0:  236213:    6943
T: 0 (728815) P:99 I:250 C: 240000 Min:   5000 Act:    5365 Avg:    5458 Max:   12065
>>> Processing msi_latency test...Test duration: 60s
Msi-latency test finish, max:3936 ns, avg:2682 ns, min:2464 ns
>>> Processing msi_jitter test...Test duration: 60s
Msi-jitter test finish, max:1552 ns, avg:304 ns, min:0 ns
>>> Processing rhealstone...Test cycles: 2
[1082.17, 1087.4]
[1082.17, 1087.4]
ctx_sum is 2169.57
ctx_avg is 1084.79


[5623.126667, 5631.456667]
[5623.126667, 5631.456667]
dead_sum is 11254.583334
dead_avg is 5627.291667


[3264.865, 3357.005]
[3264.865, 3357.005]
sem_sum is 6621.870000
sem_avg is 3310.935000


[2137.07, 2103.635]
[2137.07, 2103.635]
pree_sum is 4240.705000
pree_avg is 2120.352500


>>> Processing mmio_latency test...Test duration: 300s
MMIO-Latency test finish, max:4365 ns, avg:2286 ns, min:1555 ns
Real-time performance test finished.

To start the benchmark, run the following commands:

$ /opt/benchmarking/rtpm/rtpm_exec.sh -p "./output" -n "report.csv" -a

The results are logged in: ./output/rtpm_test.log

RTPM Command Line Parameters¶

Parameter	Explanation
`-p` or `--logpath`	Specify the path of the test file
`-n` or `--logname`	Specify the test report name in CSV format (must end with .csv)
`-l` or `--latency`	Execute Latency or Cyclictest test when running performance test
`-r` or `--rhealstone`	Execute Rhealstone test when running performance test
`-ml` or `--msilatency`	Execute Msi-latency test when running performance test
`-mj` or `--msijitter`	Execute Msi-jitter test when running performance test
`-mo` or `--mmiolatency`	Execute MMIO-Latency test when running performance test
`-s` or `--stress`	Enable stress when running performance test
`-a` or `--all`	Execute all tests when running performance test
`-h` or `--help`	Show the help list

For more information on using Intel ECI with RTPM, refer to the guide.

CODESYS Benchmark Application¶

Benchmark	Units	Version	Source
CODESYS Benchmark	microseconds	1.4	Intel Created

The CODESYS benchmark is an example application that allows deterministic control and test of the CODESYS SoftPLC. The application is defined such that a main task occurs every 250us. Within the main task is a configurable workload which iterates a number of user defined times. The specific workload is configurable to either: “floating point”, “arithmetic”, or “boolean” operations. Cycle execution times are measured from which minimum, maximum, and jitter measurements are derived. These measurements allow performance characterization of the CODESYS SoftPLC on the target system.

Prerequisites

A Windows system (x64 bit) to host the CODESYS IDE
An ECI image with the CODESYS Software PLC feature option enabled. (See Building ECI Targets)
A target system with the ECI image (with CODESYS Software PLC) installed (See Installing ECI Images)
Ability to modify IP address of both the Windows PC and target system
Ability to connect both the Windows PC and target system via Ethernet network

Deploying the CODESYS Benchmark¶

The following steps are applicable to:

This CODESYS example project was created and validated using CODESYS V3.5 SP15 Patch 4 + (64-bit). Project compatibility differences may occur when changing CODESYS version.

See also

This section assumes you are already familiar with CODESYS and how to deploy a workload to the CODESYS Linux runtime. For a detailed tutorial, see section: Application #2: CODESYS OPC UA Publish/Subscribe

Extract the CODESYS_Example_Applications.zip archive provided with the ECI bundle. Navigate to the Benchmark directory. Double-click Benchmark_Linux_3.5.15.40.project to open the project with CODESYS.
The benchmark application was created assuming the CODESYS Linux runtime has an IP address of 10.11.12.50 on interface eno1. However, you can change these if required. You need to change three IP addresses to accomplish this:

IP Address #1:

IP address in the CODESYS application (change via the CODESYS IDE) - Double-click “Ethernet” from the Devices pane. The IP address settings are displayed on the left pane.

IP Address #2:

IP address of the Windows PC running CODESYS IDE. Windows is configured with 10.11.12.55 to be on the same subnet as the CODESYS Linux runtime.

The following steps are applicable to:

IP Address #3:

IP address of target system running the CODESYS Linux runtime.

Run the following commands on the target system to change the IP address of device interface to 10.11.12.50/24. Replace <interface_name> with the name of your interface (Example: eno1):
$ ip a flush dev <interface_name> $ ip a add 10.11.12.50/24 dev <interface_name>
On the target system, navigate to /opt/benchmarking/codesys. This directory contains a number of useful scripts for improving the real-time performance of the CODESYS Linux runtime.

The scripts available at /opt/benchmarking/codesys include:
- start_CODESYS_native.sh : Optimizes the system (see list below), stops non-essential services, and restarts the CODESYS runtime
- start_CODESYS_container.sh : Optimizes the system (see list below) and starts the containerized CODESYS Linux runtime with RT priority.
  
  Note
  
  The start_CODESYS_container.sh script assumes you have completed the CODESYS Containerization section of this guide: Microservice: CODESYS Linux Runtime & CODESYS Edge Gateway
- taskset_CODESYS.sh : Optimizes the system (see list below)
The scripts also perform the following runtime optimizations:
- Uses CAT to allocate exclusive access of half the last-level cache to core 3
- Assigns benchmark thread affinity to core 3
- Assigns non-benchmark thread affinity to core 0
- Changes the priority of benchmark thread to 90 (using: chrt -f 90)
- Disables kernel machine check interrupt
- Increases the thread runtime utilization to infinity
Run the start_CODESYS_native.sh script available at /opt/benchmarking/codesys to reload the CODESYS Linux runtime in preparation of the benchmark:
```
$ cd /opt/benchmarking/codesys
$ ./start_CODESYS_native.sh
```
Build, download, and run the CODESYS Benchmark as outlined in Application #2: CODESYS OPC UA Publish/Subscribe. The CODESYS Benchmark is configured to host a web visualization at http://10.11.12.50:8080. If the benchmark is working correctly, the web visualization should be similar to following image:
Run the taskset_CODESYS.sh script at /opt/benchmarking/codesys to move all CODESYS tasks to CPU 3 and change the priority of all CODESYS tasks to RT:
```
$ ./taskset_CODESYS.sh
```
Use the RESET buttons in the web visualization to reset the statistics after running the taskset_CODESYS.sh script. This CODESYS benchmark demonstrates that real-time performance can be achieved using the ECI image in combination with proper kernel and runtime configurations.
Use the slider and arrow buttons to increase or decrease the number of workload iterations per cycle.
The colored LEDs are connected to variables mapped to an OPC UA server. Connect an OPC UA client to the CODESYS OPC UA server to read/write the values. You can download a popular free OPC UA client called UaExpert from: https://www.unified-automation.com/products/development-tools/uaexpert.html.

Performance Sanity-Check Testing¶

The following section is applicable to:

Sanity-Check #0 - Cyclictest Workload¶

Script rt_bmark.py

Step #0 - Setup the workload with the following input configuration:

One thread per core
Use clock_nanosleep instead of posix interval timers
Set number of threads to the number of cpus and same priority of all threads
Priority = 99
Timer interval on core 0: 100[us]
Interval increment for each core: 20[us]
Number of loops (on core 0): 30000

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

$ /opt/benchmarking/rt-tests/rt_bmark.py

You can consider that the sanity check has passed, if stdout results are similar to the data provided in the following links:

Script start-cyclic.py

Step #0 - Setup the workload with the following input configuration:

Core affinity = 3
Priority = -20 (nice -n -20)
Total loops = 100000
Thread distance = 1
Interval = 1000.
SSH session (not using integrated GPU)

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

$ /opt/benchmarking/rt-tests/start-cyclic.py

You can consider that the sanity check has passed, if stdout results are similar to the data provided in the following links:

Sanity-Check #1 - LMbench Workload¶

Step #0 - Setup the workload with the following input configuration:

Core affinity ($CORE_AFFINITY) = 1
Memory total ($MEMRD_SIZE) = 192M
Stride size ($MEMRD_CHUNCKS) = 512
SSH session (not using integrated GPU)

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

$ cd ~/;taskset -c $CORE_AFFINITY /usr/bin/lat_mem_rd -P 1 $MEMRD_SIZE $MEMRD_CHUNCKS &> /tmp/result_lat_mem_rd.txt

You can consider that the sanity check has passed, if stdout results are similar to the data provided in the following links:

Sanity-Check #2 - Jitter Workload¶

Step #0 - Setup the workload with the following input configuration:

Core affinity = 3
Priority = -20 nice -n -20
Noisy Neighbor stress-ng affinity = 0
CAT used to assign 0x0f to COS0
CAT used to assign 0xf0 to COS1
SSH session (not using integrated GPU)

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

$ /opt/benchmarking/jitter/start-benchmark.py

You can consider that the sanity check has passed, if stdout results are similar to the data provided in the following links:

Sanity-Check #3 - Caterpillar Workload¶

Step #0 - Setup the workload with the following input configuration:

Core affinity = 3
Priority = -20 nice -n -20
Noisy Neighbor stress-ng affinity = 0
CAT used to assign 0x0f to COS0
CAT used to assign 0xf0 to COS1
SSH session (not using integrated GPU)

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

$ /opt/benchmarking/caterpillar/start-benchmark.py

You can consider that the sanity check has passed, if stdout results are similar to the data provided in the following links:

Sanity-Check #4 - Latency Workload¶

Step #0 - Setup the workload with the following input configuration:

Core affinity = 1
Noisy Neighbor stress-ng affinity = 2
Test time = 10800 seconds
SSH session (not using integrated GPU)

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

$ cd /opt/benchmarking/rhealstone && python run_xlatency.py -T 10800 --stress
//exclude the gfx stress for image without glxgears
$ python run_xlatency.py -T 10800 --stress --no-gfx

The test results will be saved as a file named latency_test_results.txt. You can consider that the sanity check has passed, if stdout results are similar to the data provided in the following links:

Sanity-Check #5 - MSI Latency¶

Step #0 - Setup the workload with the following input configuration:

coreSpecIRQ = 1
coreSpecWQ = 1
irqSpec = 1
irqPeriod = 10
irqCount = 2000000
verbosity = 1
offsetStart = 0
blockIRQ = 0

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

# cd /opt/benchmarking/msi-latency

// for preempt-rt kernel
# ./msiLatencyTest.sh 10 2000000
//get current test result
# cat /sys/kernel/debug/msi_latency_test/current_value

// for xenomai kernel
// find Intel i210 NIC's full pci id (igbid) on your test device, by `lspci -v | grep 'Ethernet controller: Intel Corporation I210 Gigabit Network Connection'`. For example: '0000:02:00.0'
# ./run_cobalt_msi_lat.sh <unbind igbid> 10 21600
//get current test result
# cat /sys/kernel/debug/msi_latency_cobalt_test/current_value

You can consider that the sanity check has passed, if stdout results are similar to the data provided in the following links:

Sanity-Check #6 - MSI Jitter¶

Step #0 - Setup the workload with the following input configuration:

core affinity = 1
runtime = 21600(s)
interval = 100(ms)
unbind_igb_id (checked by lspci)

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

$ cd /opt/benchmarking/msi-jitter
$ ./irq_rcu.sh
$ ./run_msijitter.sh <unbind_igb_id> 1 100 21600
//get current test result
$ cat /sys/kernel/debug/msi_jitter_test/current_value

You can consider that the sanity check has passed, if stdout results are similar to the data provided in the following links:

Sanity-Check #7 - Rhealstone Workload¶

Step #0 - Setup the workload with the following input configuration:

Core affinity = 1
Total loops = 100
Noisy Neighbor stress-ng affinity = 2
SSH session (not using integrated GPU)

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

$ cd /opt/benchmarking/rhealstone && ./run_rhealstone_bmark_stress.py 100

The test result will be saved as a file named rhealstone_test_result.txt. You can consider that the sanity check has passed, if stdout results are similar to the data provided in the following links:

Sanity-Check #8 - MMIO Latency Workload¶

Step #0 - Setup the workload with the following input configuration:

Configure non-RT related tasks to core 0

$ ./opt/benchmarking/mmio-latency/configRTcores.sh

Find the physical mmio address using either of the following options:

Option 1:

//Find the physical mmio address to test by `lspci -vvv -s $BDF`.
eg: $ lspci -nn ==> 00:02.0 SATA controller; $ lspci -vvv -s 00:02.0 ==> Region 0: Memory at 80002000.

Option 2:

$ lspci -k
$ cat /proc/bus/pci/devices | grep <name> | awk '{print $4}'

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

$ ./opt/benchmarking/mmio-latency/mmioLatency.sh <mmio-address>

You can consider that the sanity check has passed, if stdout results are similar to the data provided in the following links:

Sanity-Check #9 - CODESYS PlcLogic Workload¶

Step #0 - Setup the workload with the following input configuration:

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

$ systemctl restart codesyscontrol.service
$ /opt/benchmarking/codesys/start_CODESYS_native.sh

You can consider that the sanity check has passed, if stdout results are similar to the data provided in the following links:

Sanity-Check #10 - Smokey¶

Step #0 - Setup the with the following input configuration:

Setup rtnet

//find RT_DRIVER
$ lspci -v | grep ' Ethernet controller: Intel Corporation I210 Gigabit Network Connection ' -A 15

e.g.

01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
        Subsystem: Intel Corporation I210 Gigabit Network Connection
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at a1100000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at 3000 [size=32]
        Memory at a1120000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-07-32-ff-ff-6f-ee-90
        Capabilities: [1a0] Transaction Processing Hints
        Kernel driver in use: rt_igb
        Kernel modules: igb, rt_igb

$ sudo vim /etc/rtnet.conf
//
$ RT_DRIVER="rt_igb"
// Change to the device address you found,here is my example address
$ REBIND_RT_NICS="0000:01:00.0"

Start rtnet

$ sudo /usr/sbin/rtnet master
$ sudo /usr/sbin/rtifconfig rteth0 up
$ sudo /usr/sbin/rtifconfig

Query cmd

$ sudo /usr/lib/xenomai/testsuite/smokey --list

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

//Default
$ sudo /usr/lib/xenomai/testsuite/smokey --run=12 --verbose=2
//Optional
$ sudo /usr/lib/xenomai/testsuite/smokey --run=12 --verbose=2 rtnet_driver=rt_loopback rtnet

parameters	description
–run	run [portion of] the test list, 12 = `net_udp`
–verbose	set verbosity to desired level, default = 1
`rtnet_driver`	choose network driver, default = `rt_loopback`
`rtnet_interface`	choose network interface, default = `rteth0`
`rtnet_rate`	choose packet rate . default=1000, it means send/recv udp datagram per 1000000000/1000 ns = 1ms
`rtnet_duration`	choose test duration. default=10, it means test lasts 10 seconds

You can consider that the sanity check has passed, if stdout results are as shown in the following figure, that is, no-zero and no-n/a values.

Sanity-Check #11 - Xenomai XDDP-stream Runtime¶

Step #0 - Reset the trace buffer and snapshot:

$ echo 0 > /sys/kernel/debug/tracing/events/enable
$ echo 0 > /sys/kernel/debug/tracing/trace
$ echo 0 > /sys/kernel/debug/tracing/snapshot

Step #1 - Start tracing the required events:

$ echo 1 > /sys/kernel/debug/tracing/events/sched/enable
$ echo 1 > /sys/kernel/debug/tracing/events/sched/sched_switch/enable
$ echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable

$ echo 1 > /sys/kernel/debug/tracing/events/cobalt_core/cobalt_timer_start/enable
$ echo 1 > /sys/kernel/debug/tracing/events/cobalt_core/cobalt_timer_expire/enable
$ echo 1 > /sys/kernel/debug/tracing/events/cobalt_core/cobalt_timer_stop/enable
$ echo 1 > /sys/kernel/debug/tracing/events/cobalt_core/cobalt_clock_exit/enable
$ echo 1 > /sys/kernel/debug/tracing/events/cobalt_core/cobalt_clock_entry/enable

$ echo 1 > /sys/kernel/debug/tracing/events/cobalt_core/cobalt_switch_context/enable
$ echo 1 > /sys/kernel/debug/tracing/events/cobalt_core/cobalt_schedule/enable
$ echo 1 > /sys/kernel/debug/tracing/events/cobalt_core/cobalt_synch_wakeup/enable
$ echo 1 > /sys/kernel/debug/tracing/events/cobalt_core/cobalt_synch_sleepon/enable
$ echo 1 > /sys/kernel/debug/tracing/events/cobalt_core/cobalt_synch_wakeup_many/enable

Step #2 - Get cobalt_timer and cobalt_synch, for example, timer: 18446641218937430680, synch: 18446744072525039056.

$ echo 'hist:keys=timer if ( name == "xddp-stream")' >> events/cobalt_core/cobalt_timer_start/trigger
$ echo 'hist:keys=synch' >> /sys/kernel/debug/tracing/events/cobalt_core/cobalt_synch_sleepon/trigger

$ /usr/demo/xddp-stream

$ cat /sys/kernel/debug/tracing/events/cobalt_core/cobalt_timer_start/hist
# event histogram
#
# trigger info: hist:keys=timer:vals=hitcount:sort=hitcount:size=2048 if ( name == "xddp-stream") [active]
#

{ timer: 18446641218937430680 } hitcount:        548

$ cat /sys/kernel/debug/tracing/events/cobalt_core/cobalt_synch_sleepon/hist
# event histogram
#
# trigger info: hist:keys=synch:vals=hitcount:sort=hitcount:size=2048 [active]
#

{ synch: 18446641218937439520 } hitcount:          1
{ synch: 18446744072525039056 } hitcount:        943

Step #3 - Enable XDDP-stream runtime and non-rt thread read and write back latency hist trigger:

# Runtime
echo 'hist:keys=timer:start_t=common_timestamp.usecs if ( timer==18446641218937430680)' >> events/cobalt_core/cobalt_timer_expire/trigger
echo 'xddp_stream_runtime u64 runtime' >> synthetic_events
echo 'hist:keys=timer:run_t=common_timestamp.usecs-$start_t:onmatch(cobalt_core.cobalt_timer_expire).xddp_stream_runtime($run_t) if ( name=="xddp-stream" )' >> events/cobalt_core/cobalt_timer_start/trigger
echo 'hist:keys=runtime:sort=runtime' >> events/synthetic/xddp_stream_runtime/trigger

# Non-rt thread read and write back latency
echo 'hist:keys=synch:sleep_t=common_timestamp.usecs:clock=mono if ( synch==18446744072525039056 )' >> /sys/kernel/debug/tracing/events/cobalt_core/cobalt_synch_sleepon/trigger
echo 'rw_latency u64 lat' >> /sys/kernel/debug/tracing/synthetic_events
echo 'hist:keys=synch:sampling=common_timestamp.usecs-$sleep_t:onmatch(cobalt_core.cobalt_synch_sleepon).rw_latency($sampling):clock=mono if ( synch==18446744072525039056 )' >> events/cobalt_core/cobalt_synch_wakeup/trigger
echo 'hist:keys=lat:sort=lat' >> events/synthetic/rw_latency/trigger

Step #4 - Rerun XDDP-stream demo:

$ /usr/demo/xddp-stream

Step #5 - If stdout results format as followed ie. no-zero and no-n/a values it is PASS, otherwise is FAIL sanity-check

$ cat /sys/kernel/debug/tracing/events/synthetic/xddp_stream_runtime/hist
# event histogram
#
# trigger info: hist:keys=runtime:vals=hitcount:sort=runtime:size=2048 [active]
#

{ runtime:         42 } hitcount:          1
{ runtime:         49 } hitcount:          1
{ runtime:         50 } hitcount:          1
{ runtime:         51 } hitcount:          2
{ runtime:         52 } hitcount:          1
{ runtime:         53 } hitcount:          3
{ runtime:         54 } hitcount:          3
...


$ cat /sys/kernel/debug/tracing/events/synthetic/rw_latency/hist
# event histogram
#
# trigger info: hist:keys=lat:vals=hitcount:sort=lat:size=2048 [active]
#

{ lat:         18 } hitcount:         14
{ lat:         19 } hitcount:         16
{ lat:         20 } hitcount:          1
{ lat:         25 } hitcount:          1
{ lat:         26 } hitcount:          1
...

Sanity-Check #12 - IEC 61131-3 PLCopen IEC 61131-3 KALMAN + PID workload¶

Step #0 - Open the project PLCopen_ecs-internal-testsuite_Linux_3.5.14.40.projectarchive with CODESYS

Right-click on KALMAN + PID workload Application from the Devices pane, then click on “Set Active Application” to activate the workload.
Double-click on KALMAN + PID workload Device name from the Devices pane to bring up the gateway settings, then scan network. Please also set IP address follow IP Address #2 & IP Address #3 in section Deploying the CODESYS Benchmark.

Step #1 - Run the start_CODESYS_native.sh script on target to reload the CODESYS Linux runtime in preparation of the benchmark

$ /opt/benchmarking/codesys/start_CODESYS_native.sh

Step #2 - Build, login and start running the CODESYS Benchmark

Step #3 - Get scorecard on target when the test ends (about 5min)

$ cd /var/opt/codesys/PlcLogic
$ vi IEC-61131-3_process-automation_scorecard*date*.txt

You can consider that the sanity check has passed, if stdout results are similar to the following, that is, no-zero and no-n/a values:

--- PLCopen 61131-3 Process Automation testcase scorecard ---
Target: Id 5 Type 4102 Version 3.5.14.40
Test at: 2020-3-6-11_21
Platform: IEC 61131-3 workload -  Control Function Block
Extern: Extern
Code: Ram
Measured timer resolution [ns]: 91
Measured overhead of SysTimeGetNs [ns]: 88.1744

Test (all values in ns)              Avg       Min       Max    StdDev Num Calls
--------------------------------------------------------------------------------
PID 1x (REAL)                    162.091   153.826  16723.83    93.265    249910
PID 5x (REAL)                    523.246   514.826 16140.825   114.039    173257
PID 10x (REAL)                  1127.835  1114.826  16695.83   146.313    113855
KALMAN 1x filter (REAL)          409.356   393.826 16231.825   118.669    192200
KALMAN 5x filter (REAL)         1864.439  1810.826  22958.83   219.659     80326
KALMAN 10x filter (REAL)        3678.901  3570.826  20356.83   324.198     46422
Logic+Arith 1K fragment        13040.663 12712.825  75645.83  1643.232     14465
Logic+Arith 5K fragment         63906.58  62587.83  99461.83  3153.447      3472
Logic+Arith 10K fragment       127512.24 125070.83  164859.8  4163.481      1968

Sanity-Check #13 - IEC 61131-3 PLCopen Arithmetic and Logic Elementary Workload¶

Step #0 - Open the project PLCopen_ecs-internal-testsuite_Linux_3.5.14.40.projectarchive with CODESYS:

Right-click the Arithmetic & Logic Elementary Workload application from the Devices pane, then click Set Active Application to activate the workload.
Double-click the Arithmetic & Logic Elementary Workload Device name from the Devices pane to bring up the gateway settings, then scan network. Also, set the IP address according to IP Address #2 and IP Address #3 mentioned in Deploying the CODESYS Benchmark.

Step #1 - Run the start_CODESYS_native.sh script on target to reload the CODESYS Linux runtime in preparation of the benchmark:

$ /opt/benchmarking/codesys/start_CODESYS_native.sh

Step #2 - Build, login, and run the CODESYS benchmark.

Step #3 - Get the scorecard on target when the test ends (in about one minute):

$ cd /var/opt/codesys/PlcLogic
$ vi IEC-61131-3_alu_scorecard*date*.txt

You can consider that the sanity check has passed, if stdout results are similar to the following, that is, no-zero and no-n/a values:

--- PLCopen 61131-3 ALU Measurement scorecard ---
Target: Id 5 Type 4102 Version 3.5.14.40
Test at: 2020-3-6-14_14
Platform: ECS 1.x IEC 61131-3 scorecard - Basic Math & Logic
Extern: Extern
Code: Ram
Measured timer resolution [ns]: 91
Measured overhead of SysTimeGetNs [ns]: 88.1424

Test (all values in ns)              Avg       Min       Max    StdDev Num Calls
--------------------------------------------------------------------------------
AND (BOOL)                         2.087     1.883    57.044     1.741  85717500
AND (BYTE)                         2.018     1.843    73.005     1.641  87597500
AND (WORD)                         2.081     1.883    34.374     1.721  85665000
AND (DWORD)                        2.081     1.883    33.355     1.747  85637500
OR (BOOL)                          2.089     1.883    45.715     1.738  85730000
...

Sanity-Check #14 - IEC 61131-3 PLCopen G-Code CNC Execution Performance Workload¶

Step #0 - Open the project PLCopen_ecs-internal-cnc-workload_Linux_3.5.14.40.projectarchive with CODESYS:

Right-click FullMeasurement or OnlyPLCopenMeasurement application from the Devices pane, then click Set Active Application to activate the workload.
Double-click the G-Code CNC execution performance workload Device name from the Devices pane to bring up the gateway settings, then scan network. Also, set the IP address according to IP Address #2 and IP Address #3 mentioned in Deploying the CODESYS Benchmark.

Step #1 - Run the start_CODESYS_native.sh script on target to reload the CODESYS Linux runtime in preparation of the benchmark

$ /opt/benchmarking/codesys/start_CODESYS_native.sh

Step #2 - Build, login and start running the CODESYS Benchmark

Step #3 - Get scorecard on target when the test ends (about 4min)

$ cd /var/opt/codesys/PlcLogic
$ vi IEC-61131-3_cnc-and-motion-sim_scorecard*date*.txt

You can consider that the sanity check has passed, if stdout results are similar to the following, that is, no-zero and no-n/a values:

--- PLCopen IEC61131-3 Workload - gcode and motion PVA simulation control ---
Target: Id 5 Type 4102 Version 3.5.14.40
Test at: 2020-3-6-11_39
Platform: IEC 61131-3 Workload - Gcode and motion PVA simulated control
Extern: Extern
Code: Ram
Measured timer resolution [ns]: 91
Measured overhead of SysTimeGetNs [ns]: 0.0

Test (all values in ns)              Avg       Min       Max    StdDev Num Calls
--------------------------------------------------------------------------------
Decoder (no splines)           13715.767    1723.0   26590.0  5030.741        30
SmoothPath (3rd order) + CheckVelocities   3286.64    1442.0    5963.0   803.035        25
Interpolator (mixed, sigmoid)    878.176     535.0   11099.0   621.503      3234
Interpolator (mixed, quadratic)   808.787     455.0   11949.0   603.546      3335
CheckVelocities (Buffer:10000)    7256.0    7256.0    7256.0       0.0         1
...

Sanity-Check #15 - OpenGL glxgears Workload¶

Step #0 - Make sure mesa has been installed in the target set up with option Graphical Windowing System.

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

$ cd /opt/benchmarking/rhealstone && python3 run_opengl_glxglears.py -T 60 --cyclictest

Test result will be saved as a file named OpenGL_glxgears_result.txt. You can consider that the sanity check has passed, if the test results are similar to the following:

==========modify date:Mon Jul  6 10:18:22 2020==========#==============================================================================
#  Test case (1/1): Test.glxgears.OpenGL.cyclictest
#..............................................................................
#Starting stress(cyclictest)
#  Command: 'taskset -c 1 /opt/benchmarking/rt-tests/cyclictest -p 99 -i 250 -m -N'
#Starting test
#  Command: taskset -c 0 glxgears -geometry 600x800
#Hung task detection not supported
#  (File /proc/sys/kernel/hung_task_timeout_secs not found)
#10:18:22: Start of execution
#10:19:22:  1/ 1: min: 59.997
#10:19:22: Test completed. Actual execution time:0:01:00
#Terminated stress
#Min FPS: 59.997
#PASS

Test.OpenGL.cyclictest[FPS]:
59.997
PASS:Test.OpenGL.cyclictest

==========modify date:Mon Jul  6 10:19:33 2020==========#==============================================================================
#  Test case (1/1): Test.glxgears.OpenGL.latency
#..............................................................................
#Starting stress(latency)
#  Command: 'taskset -c 1 /usr/bin/latency -c 1 -p 250'
#Starting test
#  Command: taskset -c 0 glxgears -geometry 600x800
#Hung task detection not supported
#  (File /proc/sys/kernel/hung_task_timeout_secs not found)
#10:19:33: Start of execution
#10:20:33:  1/ 1: min: 59.998
#10:20:33: Test completed. Actual execution time:0:01:00
#Terminated stress
#Min FPS: 59.998
#PASS

Test.OpenGL.latency[FPS]:
59.998
PASS:Test.OpenGL.latency

Sanity-Check #16 - Smokey Net Server¶

Step #0 - Setup the workload with the following input configuration:

Setup rtnet for server and client

//Find the values of RT_DRIVER and REBIND_RT_NICS for setting "/etc/rtnet.conf" and the mac address for setting "/tmp/rtnet_smokey.log"
$ lspci -v | grep ' Ethernet controller: Intel Corporation I210 Gigabit Network Connection ' -A 15

e.g.

01:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
        Subsystem: Intel Corporation I210 Gigabit Network Connection
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at a1100000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at 3000 [size=32]
        Memory at a1120000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-07-32-ff-ff-6f-ee-90
        Capabilities: [1a0] Transaction Processing Hints
        Kernel driver in use: rt_igb
        Kernel modules: igb, rt_igb

//modify those values in the configure file according your board(both server and client)
$ sudo vim /etc/rtnet.conf

//for I210, the rt driver is "rt_igb"
$ RT_DRIVER="rt_igb"
//change to the device address of your board, here is an example address
$ REBIND_RT_NICS="0000:01:00.0"

//clear the IPADDR and NETMASK, those values will be configured by following steps
IPADDR=""
NETMASK=""

//set TDMA_MODE as master for both boards(server and client), this value will be overwrite by following steps
TDMA_MODE="master"

//clear the TDMA_SLAVES, the client address will be configure by the file "/tmp/rtnet_smokey.log"
TDMA_SLAVES=""

Create rtnet_smokey.log for the server:

//create file "/tmp/rtnet_smokey.log" in server then record the client address according your client board in this file, here is an example
$ sudo vim /tmp/rtnet_smokey.log
00:07:32:6B:A7:FE 192.208.1.101

Query cmd

$ sudo /usr/lib/xenomai/testsuite/smokey --list

Step #1 - Run the workload with the following command in target shell (locally or remotely over SSH or UART console):

Start rtnet and configure the IP address on server

//load all nessary modules
$ sudo /usr/sbin/rtnet start

//Server will be reconfigure by command smokey_net_server, here detach firtly, and TDMA is not enabled in this case.
$ sudo /usr/sbin/rtcfg rteth0 detach

//configure the server ip address
$ sudo /usr/sbin/rtifconfig rteth0 up 192.208.1.100 netmask 255.255.255.0

Start rtnet and configure the IP address on client

//load all nessary modules
$ sudo /usr/sbin/rtnet start

//Client will be reconfigure by following step, here detach firtly, and TDMA is not enabled in this case.
$ sudo /usr/sbin/rtcfg rteth0 detach

//configure the client ip address
$ sudo /usr/sbin/rtifconfig rteth0 up 192.208.1.101 netmask 255.255.255.0

Start smokey_net_server on the server:

$ sudo /usr/lib/xenomai/testsuite/smokey_net_server rteth0 --file /tmp/rtnet_smokey.log

Start smokey on the client:

//configure as client mode
$ sudo /usr/sbin/rtcfg rteth0 client -c

//client announe to server
$ sudo /usr/sbin/rtcfg rteth0 announce

//run net_udp test case
$ sudo /usr/lib/xenomai/testsuite/smokey --run=12 --verbose=2 rtnet_driver=rt_igb rtnet_interface=rteth0 rtnet_rate=1000 rtnet_duration=10

parameters	description
–run	run [portion of] the test list, 12 = `net_udp`
–verbose	set verbosity to desired level, default = 1
`rtnet_driver`	choose network driver, default = `rt_loopback`
`rtnet_interface`	choose network interface, default = `rteth0`
`rtnet_rate`	choose packet rate . default=1000, it means send/recv udp datagram per 1000000000/1000 ns = 1ms
`rtnet_duration`	choose test duration. default=10, it means test lasts 10 seconds

You can consider that the sanity check has passed, if stdout results are similar to the following, that is, no-zero and no-n/a values:

Note

Better use onboard ethernet ports on the target to reach best real time performance.