Attention

You are viewing an older version of the documentation. The latest version is v3.3.

Troubleshoot

See also

For questions and answers, visit the Edge Controls for Industrial online support forum.

The techniques presented in this section will help in exposing the underlying cause of latency and jitter in real-time applications.

The following section is applicable to:

../_images/target_generic5.png

Monitor CPU Utilization and Affinity

To ensure that the workloads run isolated on CPUs, monitor the tasks currently running. The utility htop enables monitoring of task CPU affinity.

htop is an interactive system-monitor process-viewer and process-manager. It is designed as an alternative to the Unix program top. It shows a frequently updated list of the processes running on a computer, normally ordered by the amount of CPU usage. htop provides a complete list of processes running, instead of the top resource-consuming processes. htop uses color and provides a visual representation of processor, swap, and memory status.

Install htop:

$ sudo apt install htop
Copy to clipboard

Run htop:

$ htop
Copy to clipboard

You can include additional columns to the output of htop. To configure htop to filter tasks by CPU affinity, do the following:

  1. Press the F2 key to open the menu system

  2. Press the down arrow key until Columns is selected from the Setup category

  3. Press the right arrow key to until the selector moves to the Available Columns section

  4. Press the down arrow key until PROCESSOR is selected from the Available Columns section

  5. Press the Enter key to add PROCESSOR to the Active Columns section

  6. Press the F10 key to complete the addition

  7. Press the F6 key to open the Sort by menu

  8. Use the →↑↓ arrow keys to move the selection to PROCESSOR

  9. Press the Enter key to filter tasks by processor affinity

After filtering tasks by CPU affinity, verify that real-time workloads are isolated; real-time workloads should not share a CPU.

Note: Workloads may consist of many individual child tasks. These child tasks are constituent to the workload and it is acceptable and preferred that they run on the same CPU.

Monitor Kernel Interrupts

The Linux kernel is often a source of unwanted interrupts. You can monitor the number of interrupts that occur on a CPU by running the following command:

$ watch -n 0.1 cat /proc/interrupts
Copy to clipboard

This command polls the processor every 100 milliseconds and displays the interrupt counts. Ideally, isolated CPUs 1, 2, and 3 should not show any incrementing interrupt counts. In practice, the Linux “Local timer interrupt” may still be occurring, but properly prioritized workloads should not be impacted.

Determine CPU Cache Allocation

Using Cache Allocation Technology improperly can have detrimental effects on a real-time workload. It is important to verify that cache partitioning is properly configured, otherwise increased cache misses might occur. You can modify the cache allocation using the pqos tool.

The pqos tool can be installed from the ECI repository. Setup the ECI repository, then perform the following command to install this component:

$ sudo apt install intel-cmt-cat
Copy to clipboard

Use the following command to verify CAT configuration:

$ pqos -s
Copy to clipboard

An example output from a target system utilizing CAT is shown below:

../_images/pqos-s1.png

Common PTP Sync Issues

In nominal mode to deliver egress timestamps to ptp4l, Linux loops back the original packet to the error queue of the listening socket, with the timestamps attached.

However, certain issues might occur under the following circumstances:

GM PTP messages TTL > 1

If the Grand master clock messages have TTL greater than 1, the messages will not be terminated on the boundary clock, but instead be propagated to the slaves. The slaves will directly see the master messages, evaluate it as the best master clock, and attempt to synchronize directly to it. The traffic goes through slow path, which leads to unpredictable path delay; and the slave will be unable to synchronize reasonably close to the master clock.

The issue can be corrected by either of the following ways:

Set up an iptables rule that prevents propagation of PTP packets:

$ iptables -I FORWARD -p udp -m udp --dport 319 -j DROP
$ iptables -I FORWARD -p udp -m udp --dport 320 -j DROP
Copy to clipboard

Another possibility is to have the Grand master clock send packets such that when they arrive to the switch, they have TTL of 1. Such packets are not forwarded. In ptp4l, TTL of sent packets is configured by the udp_ttl option.

Overwhelmed with Ingress PTP Traffic

If a socket is overwhelmed with ingress traffic, there may not be space to enqueue the timestamped egress traffic.

While ptp4l will not notice a missed ingress packet, it needs to wait for the timestamp of the packets that it sent (see tx_timestamp_timeout slave clock ptp4l configuration ), and will therefore notice and complain that the timestamp was not delivered.

Unfortunately, ptp4l does not allow setting of the socket receive buffer size, so if this happens, increase the default value before starting ptp4l, and then revert it. For example:

$ sysctl -w net.core.rmem_max=4194304 # 4MB
$ sysctl -w net.core.rmem_default=4194304
Copy to clipboard

Use Linux perf Event with Intel® Processor Trace

Intel® Processor Trace (Intel® PT) traces program execution (every branch) with low overhead. Intel® PT is supported in Linux perf, which is integrated in the Linux kernel. It can be used through the perf command or through gdb.

Both Intel® Processor Trace enabled Linux perf and Intel® Xed tools Debian packages can be installed from the ECI repository. Setup the ECI repository, then perform either of the following commands to install this component:

../_images/target_generic5.png
Install from meta-package
$ sudo apt install eci-realtime-benchmarking
Copy to clipboard
Install from individual Deb package
$ sudo apt install linux-perf xed
Copy to clipboard

Usage

This section is not an exhaustive introduction to the use of Intel® PT, it aims at providing basic commands for creating perf Intel® PT trace record:

  • perf record -e intel_pt// ./program Records a program trace

  • perf record -e intel_pt// -a sleep 1 Records whole system Trace for 1 second

  • perf record -C 0 -e intel_pt// -a sleep 1 Records CPU 0 trace for 1 second

  • perf record --pid $(pidof program) -e intel_pt// Records an already running program trace

  • perf record -a -e intel_pt//k sleep 1 Records kernel space only trace, complete system

  • perf record -a -e intel_pt//u Records user space only trace, complete system

  • perf record -a -e intel_pt/cyc=1,cyc_thresh=2/ ... Records fine grained timing (needs Skylake/Goldmont, adds more overhead)

  • perf record -e intel_pt// --filter 'filter main @ /path/to/program' ... Only records main function in program

  • perf record -e intel_pt// -a --filter 'filter sys_write' program Filters kernel code

  • perf record -e intel_pt// -a --filter 'start func1 @ program' --filter 'stop func2 @ program' program Starts tracing in program at main and stop tracing at func2

    Note

    Check if Intel® PT is supported and the capabilities that are supported.

    # ls /sys/devices/intel_pt/format/
    branch  cyc  cyc_thresh  fup_on_ptw  mtc  mtc_period  noretcomp  psb_period  pt  ptw  pwr_evt  tsc
    
    Copy to clipboard

    Increase perf buffer to limit data loss

    echo $[100*1024*1024] > /proc/sys/kernel/perf_event_mlock_kb
    perf record -m 512,100000 -e intel_pt// ...
    
    Copy to clipboard

When decoding kernel data, the decoder usually has to run as root:

  • perf script --ns --itrace=cr Decodes program execution and displays function call graph. perf script by defaults sampling the data every 100us.

  • perf script --ns --itrace=i0ns ``   Decodes program execution with small *sampling* resolution using ``–itrace option, example with 10us

  • perf script --ns --insn-trace --xed -F,+srcline,+srccode,+insn,+time,+sym,+ip,-event,-period Shows every assembly instruction executed with IA64 disassembler.

  • perf script --ns --itrace=i0ns -F time,sym,srcline,ip Shows source lines executed (requires ELF debug symbol)

    Important

    perf has to save the data to disk. The CPU can execute branches much faster than the disk can keep up, so there will be some data loss for the code that executes many instructions. perf has no way to slow down the CPU, so when trace bandwidth is greater than disk bandwidth, there will be gaps in the trace. Due to this, it is not a good idea to try to save a long trace, but work with shorter traces. Long traces also take a lot of time to decode.

For further information, refer to:

Tracing your Own Code

  1. Compile a simple “Hello World” with debugging information:

    #include <stdio.h>
    
    int main()
    {
        printf("Hello World!\n");
        return 0;
    }
    
    Copy to clipboard
    $ gcc -g -o hello hello.c
    
    Copy to clipboard
  2. Set Linux runtime initial conditions to allow seeing kernel symbols (as root):

    $ echo 'kernel.kptr_restrict=0' >> /etc/sysctl.conf
    $ sysctl -p
    $ net.ipv4.conf.default.rp_filter = 1
    $ net.ipv4.conf.all.rp_filter = 1
    $ kernel.kptr_restrict = 0
    
    Copy to clipboard
  3. Execute perf record with options -e to select intel_pt/cyc,noretcomp/u event to get Intel® PT with cycle-accurate mode. Add noretcomp to get a timing information at RET instructions:

    $ perf record -e intel_pt/cyc,noretcomp/u ./hello
    Hello World!
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.012 MB perf.data ]
    
    Copy to clipboard

    The expected perf script decoded outcome is that an instruction trace be back-annotated from Intel® PT records with source line information and source code:

    Click to toggle results

Use Wireshark to Dissect OPC UA Pubsub with Time-scheduled Traffic-class

Wireshark has a very large set of plugins to dissect EtherCAT, Profinet, PTP, OPC UA Client/Server, … at packet-level.

../_images/opc-ua_eth_frame.png
  1. Create uadp_l2_dissector.lua LUA dissector file to decode UA device UADP ETH (EtherType 0xB62C) packet content subject under 802.1Qbv scheduling policy:

    Click to toggle example script

  2. In Microsoft Windows, load custom-protocol Dissector to decoded L2 UADP ETH EtherType= 0xB62C into Wireshark:

    • Open %APPDATA%\Wireshark\plugins paths to your Personal plugin folders.

    • Copy the uadp_l2_dissector.lua file into the directory

  3. In Linux, load custom-protocol Dissector to decoded L2 UADP ETH EtherType= 0xB62C into Wireshark:

    • Open Wireshark and navigate to Help > About Wireshark > Folders.

    • Find the path to your plugin folders (any of Personal Plugins, Global Plugins, Personal Lua Plugins, Global Lua Plugins will work).

    • Copy the uadp_l2_dissector.lua file to the according directory (Note: To copy the files to any of the directories, you need to be a root user).

    • Restart Wireshark.

    Note: Make sure, that there are no errors on startup otherwise check readability/accessibility of the script file!

  4. Execute a workload that generates OPC UA traffic.

  5. In Wireshark, capture some overly simplistic UADP ETH (0xB62C) packets, which should be automatically been decoded.

  6. In Wireshark, display both PTP and UADP ETH traffic into IO-graph window:

    ../_images/wireshark_UADP_IO-graph.png
  7. Alternatively, (on headless system) run tshark -e tmp.pcap command-line instead to post-process tcpdump PCAP and report-out uadp packets fields.

    $ tshark -X lua_script:uadp_l2_dissector.lua -r tmp.pcap -t e -2 -R uadp.PayloadHeader.writerId==1 \
      -E separator=, \
      -E header=n \
      -Tfields \
      -e frame.number \
      -e frame.time_epoch \
      -e uadp.PayloadHeader.writerId  \
      -e uadp.DataSetMessage0Payload.pl_epoch \
      -e uadp.DataSetMessage0Payload.latency1 > tmp.out
    
    Copy to clipboard

Custom Kernel Boot Parameters

To test custom alternative kernel boot parameters, create or edit the /boot/grub/custom.cfg file:

$ update-grub
Copy to clipboard

Note: Modifying the kernel boot parameters can result in undesired system behavior and should only ever be done with prior knowledge on how it may affect the overall system performance.