Attention

You are viewing an older version of the documentation. The latest version is v3.3.

Overview of IEEE 802.1AS Generalized Precision Time Protocol (gPTP)

Precision Time Protocol (PTP) is defined in IEEE 1588 as Precision Clock Synchronization for Networked Measurements and Control Systems. The IEEE 802.1AS standard specifies the use of IEEE 1588 specifications, where applicable, in the context of IEEE Std 802.1D-2004 and IEEE Std 802.1Q-2005. It includes distributed device clocks of varying precision and stability.

Generalized Precision Time Protocol (gPTP) is designed specifically for industrial control systems and is optimal for use in distributed systems because it requires minimal bandwidth and very little CPU processing overhead.

An IEEE 802.1AS capable TSN domain is made up of gPTP enabled Ethernet endpoints and switches.

The following figure illustrates the PTP clocks in a primary and secondary Ethernet port hierarchy within a TSN domain.

../../../_images/network_time_domain.png

gPTP Clock Types

The Egress and Ingress timestamp precision is of paramount importance for robust IEEE 802.1AS time synchronization.

It is best obtained via PTP, directly over Layer 2 of Ethernet Type field 0x88F7 (that is, PTP), VLAN tags, and timestamping with PTP Hardware Clock (PHC) assistance on the present MAC multicast addresses:

  • 01-1B-19-00-00-00 for all except peer delay measurement

  • 01-80-C2-00-00-0E for peer delay measurement

A PTP network may comprehend the following clock types:

Ordinary Clock

An Ordinary Clock (OC) functions as a single PTP port and can be selected by the Best Master Clock Algorithm (BMCA) to serve as a primary port or secondary port within a IEEE 802.1 TSN domain.

OCs are the most common clock type because they are used as Ethernet endpoints on a network that is connected to devices requiring synchronization.

Best Master Clock Algorithm

The Best Master Clock Algorithm (BMCA) is the foundation of the PTP functionality. It provides a means to establish the best master clock in its subdomain out of all advertised IEEE 802.1AS clocks on the network, using PTP =unicast or multicast packets.

The BMCA must run locally on each Ethernet port of the IEEE 802.1 TSN network to continuously monitor PTP packets at every Announce interval to quickly adjust for changes in time synchronization configuration.

BMCA based on IEEE 1588-2008 uses Announce PTP general messages for advertising clock properties.

BMCA assesses the best master clock in the subdomain using the following criteria:

  1. Clock quality (GPS is considered the highest quality)

  2. Accuracy of the clock’s time base (decimal from 0-255)

  3. Stability of the local oscillator

  4. Closest clock to the grandmaster

For synchronizing a local free-running clock, BMCA based on IEEE 1588-2008 collects several set-points to determine the best clock using the following attributes, in the indicated order:

  1. Priority1 - User-assigned priority to each clock. The range is from 0 to 255.

  2. Class - Class to which the clock belongs. Each class has its own priority.

  3. Accuracy - Precision between clock and UTC in nanoseconds.

  4. Variance - Variability of the clock (ECI default OxFFFF ).

  5. Priority2 - Final-defined priority. The range is from 0 to 255.

  6. Extended Unique Identifier (EUI) - 64-bit Unique Identifier.

Transparent Clock

The role of Transparent Clocks (TC) in a IEEE 802.1 TSN domain with TSN switch hops is to update the time-interval field that is part of the PTP event message.

../../../_images/network_time_domain_SYNC.png

TC provides correction that compensates for TSN switch propagation delay on downstream data-link to all secondary ports receiving the PTP event message sequences during time-synchronization cycles.

There are two types of TCs:

  • End-to-end (E2E) TC measures the transit time (also called residence time) of the PTP event message. It does not provide compensation for the propagation delay of the link itself.

  • Peer-to-peer (P2P) TC measures the residence time (same as for E2E TC) of the PTP event message and provides the compensation for the link propagation delay.

P2P delay measurement is very useful when the network is reconfigured by a redundancy protocol mechanism utilized by IEEE 802.1AS for gPTP.

Peer-to-Peer Delay Measurement Mechanism

The P2P delay request-response mechanism improves the accuracy of the transmit time added offset (also known as residence time), by including the link delay measured between two clock ports implementing the P2P TC.

P2P uses the following PTP general and event messages to generate and communicate link delay information:

  • Pdelay_Req

  • Pdelay_Resp

  • Pdelay_Resp_Follow_Up

The upstream link delay is the estimated packet propagation delay between the upstream neighbor P2P TC and the P2P TC under consideration.

../../../_images/network_time_sync_P2P_seq_diag.png

Both residence time and upstream link delay are added to the correction field of the PTP event message and the correction field of the message received by the secondary port contains the sum of all link delays.

../../../_images/network_time_domain_P2P.png

Grandmaster Clock

The Grandmaster (GM) clock is the primary source of time for clock synchronization using the PTP protocol. The GM clock usually has a very precise time source, such as a external GPS (example UART NMEA protocol) or atomic clock accurate pulse-per-second (PPS) signal.

When the Industrial IEEE 802.1 TSN domain does not require any external time reference and only needs to be synchronized with single time reference, the GM clock can be a free running oscillator.

Boundary Clock

A Boundary Clock (BC) in a IEEE 802.1 TSN domain operates in place of a standard network switch. The BC typically provides an interface between TSN domains. Such device needs more than one PTP enabled Ethernet port, and each port provides access to dissociated PTP communication path.

The BC uses the BMCA to select the best clock seen by any port. The selected Ethernet port is then set as a secondary port. The primary port synchronizes the clocks connected downstream, while the secondary port synchronizes with the upstream master clock.

Linux PTP Stack 802.1AS gPTP Profile

Intel® Ethernet Controller provides hardware offloading capability to synchronize the clocks in packet-based networks as defined in IEEE 802.1AS PTP event message sequences.

The open source Linux PTP 3.1 is the essential ingredient to set up an IEEE 802.1AS-2011 defined gPTP Profile on Intel® Ethernet Controller, since it:

  • Supports IEEE 802.1AS-2011 in the role of TSN endpoint asCapable

  • Implements OC, TC, and BC

  • Transports PTPv2 message UDP/IPv4, UDP/IPv6, and Layer 2 Ethernet (EtherType 0x88f7)

  • Implements Unicast, Multicast, and Hybrid mode operations

  • Implements P2P and E2E delay measurement mechanisms (one-step or two-step)

  • Supports hardware offloading and software time-stamping via the Linux SO_TIMESTAMPING socket option

  • Supports the Linux PHC subsystem by using the clock_gettime family of calls, including the clock_adjtimex system call

  • Adds ts2phc pin control and GPS NMEA external time-reference source

  • Supports VLAN interfaces

IEEE 802.1as gPTP Profile Essential

The following section is applicable to:

../../../_images/target_generic6.png

The gPTP profiles can be installed from the ECI repository to match IEEE standard 802.1AS:

Setup the ECI repository, then run the following command to install this component:

logo_debian logo_ubuntu

Install from individual Deb package

$ sudo apt install iotg-gptp-configs

Examples: gPTP profiles

  1. The ptp4l daemon establishes gPTP Global Time Reference assuming either the GM or the OC role:

    taskset -c 1 ptp4l -mP2Hi enp1s0.vlan -f /opt/intel/iotg_tsn_ref_sw/common/gPTP.cfg --step_threshold=2 --socket_priority 2 2&> /var/log/ptp4l.log &
    

    Synchronizes the PTP Hardware Clock (PHC) from the Intel® Ethernet Controller /opt/intel/iotg_tsn_ref_sw/common/gPTP*.cfg templates files specified and -f:

    • -i - Specifies the network interface that this instance of ptp4l is controlling

    • --step_threshold - Is set so that ptp4l converges faster when the time jump occurs

    • -ml [1-7] - Enables log-level messages on standard output

    For more information on the ptp4l configuration option, refer to the ptp4l man page.

    The following table exemplifies the IEEE 802.1as time domain configurations with or without the TSN switch.

    Note: Some of the following gPTP would require a TSN switch, such as Kontron Kbox C-102-2 TSN StarterKit.

    gPTP Profiles

    ECI Endpoints (Master-only Port) p4pl -Hi <eth> -f <.cfg> --socket_priority=

    Kontron PCIE-0400-TSN (Switch Port) p4pl -f <.cfg> -ml 7

    ECI Endpoints OCx (Secondary-state Port) p4pl -Hi <eth> -f <.cfg> --socket_priority=

    DeepCascade 802.1as

    • ECI GM clock

    • Kontron TC clock

    • ECI OCx clock

    [global]
    gmCapable               1
    priority1               248
    priority2               248
    logAnnounceInterval     1
    logSyncInterval         -3
    syncReceiptTimeout      3
    neighborPropDelayThresh 800
    min_neighbor_prop_delay -20000000
    assume_two_step         1
    path_trace_enabled      1
    follow_up_info          1
    ptp_dst_mac             01:80:C2:00:00:0E
    network_transport       L2
    delay_mechanism         P2P
    tx_timestamp_timeout    100
    transportSpecific       0x1
    #
    clockClass              248
    clockAccuracy           0xfe
    offsetScaledLogVariance 0xffff
    timeSource              0xa0
    #
    #
    #
    #
    #
    #
    #
    
    [global]
    gmCapable               0
    priority1               254
    #
    tc_spanning_tree        1
    summary_interval        1
    #
    #
    #
    assume_two_step         1
    #
    follow_up_info          1
    ptp_dst_mac             01:80:C2:00:00:0E
    network_transport       L2
    delay_mechanism         P2P
    tx_timestamp_timeout    10
    clock_type              P2P_TC
    #
    productDescription      Kontron;
    manufacturerIdentity    00:3a:98
    #
    [CE01]
    transportSpecific 0x1
    [CE02]
    transportSpecific 0x1
    [CE03]
    transportSpecific 0x1
    [CE04]
    transportSpecific 0x1
    
    [global]
    gmCapable               0
    priority1               248
    priority2               248
    logAnnounceInterval     1
    logSyncInterval         -3
    syncReceiptTimeout      3
    neighborPropDelayThresh 8000
    min_neighbor_prop_delay -20000000
    assume_two_step         1
    path_trace_enabled      1
    follow_up_info          1
    ptp_dst_mac             01:80:C2:00:00:0E
    network_transport       L2
    delay_mechanism         P2P
    tx_timestamp_timeout    100
    transportSpecific       0x1
    

    Star 802.1as

    • Kontron GM clock

    • ECI OC clocks

    [global]
    gmCapable               1
    priority1               248
    priority2               248
    logAnnounceInterval     1
    logSyncInterval         -3
    syncReceiptTimeout      3
    neighborPropDelayThresh 800
    min_neighbor_prop_delay -20000000
    assume_two_step         1
    path_trace_enabled      1
    follow_up_info          1
    ptp_dst_mac             01:80:C2:00:00:0E
    network_transport       L2
    delay_mechanism         P2P
    tx_timestamp_timeout    100
    summary_interval        0
    #
    productDescription      Kontron;
    manufacturerIdentity    00:3a:98
    [CE01]
    transportSpecific 0x1
    [CE02]
    transportSpecific 0x1
    [CE03]
    transportSpecific 0x1
    [CE04]
    transportSpecific 0x1
    
    [global]
    gmCapable               0
    priority1               248
    priority2               248
    logAnnounceInterval     1
    logSyncInterval         -3
    syncReceiptTimeout      3
    neighborPropDelayThresh 8000
    min_neighbor_prop_delay -20000000
    assume_two_step         1
    path_trace_enabled      1
    follow_up_info          1
    ptp_dst_mac             01:80:C2:00:00:0E
    network_transport       L2
    delay_mechanism         P2P
    tx_timestamp_timeout    100
    transportSpecific       0x1
    

    Direct 802.1as

    • ECI GM clock

    • ECI OC clocks

    [global]
    gmCapable               1
    priority1               248
    priority2               248
    logAnnounceInterval     1
    logSyncInterval         -3
    syncReceiptTimeout      3
    neighborPropDelayThresh 800
    min_neighbor_prop_delay -20000000
    assume_two_step         1
    path_trace_enabled      1
    follow_up_info          1
    ptp_dst_mac             01:80:C2:00:00:0E
    network_transport       L2
    delay_mechanism         P2P
    tx_timestamp_timeout    100
    transportSpecific       0x1
    #
    clockClass              248
    clockAccuracy           0xfe
    offsetScaledLogVariance 0xffff
    timeSource              0xa0
    #
    #
    #
    #
    #
    
    [global]
    gmCapable               0
    priority1               248
    priority2               248
    logAnnounceInterval     1
    logSyncInterval         -3
    syncReceiptTimeout      3
    neighborPropDelayThresh 8000
    min_neighbor_prop_delay -20000000
    assume_two_step         1
    path_trace_enabled      1
    follow_up_info          1
    ptp_dst_mac             01:80:C2:00:00:0E
    network_transport       L2
    delay_mechanism         P2P
    tx_timestamp_timeout    100
    transportSpecific       0x1
    
    ptp4l[338318.346]: selected /dev/ptp1 as PTP clock
    ptp4l[338318.394]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
    ptp4l[338318.394]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
    ptp4l[338322.332]: port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
    ptp4l[338322.332]: selected local clock 001395.fffe.3462a0 as best master
    ptp4l[338322.332]: port 1: assuming the grand master role
    
  2. Use the pmc utility to configure ptp4l in runtime:

    $ pmc -u -b 0 -t 1 "SET GRANDMASTER_SETTINGS_NP clockClass 248 clockAccuracy 0xfe offsetScaledLogVariance 0xffff currentUtcOffset 37 leap61 0 leap59 0 currentUtcOffsetValid 1 ptpTimescale 1 timeTraceable 1 frequencyTraceable 0 timeSource 0xa0"
    

    This utility incrementally modifies certain clock parameters at runtime using the SET GRANDMASTER_SETTINGS_NP runtime API:

    • currentUtcOffset - Time Delta between TAI and UTC (default is 37 seconds).

    • Class - Class to which the clock belongs. Each class has its own priority.

    • Accuracy - Precision between clock and UTC, in nanoseconds.

    • Variance - Variability of the clock (default OxFFFF )

    • timeSource - TimeSource field used in announce messages.

    • Extended Unique Identifier (EUI) - A 64-bit unique identifier.

  3. phc2sys daemon synchronizes the 802.1as network time (so called PHC) and the Linux System clock (CLOCK_REALTIME, CLOCK_TAI, and so on):

    $ taskset -c 1 phc2sys -c CLOCK_REALTIME --step_threshold=1 -s enp1s0 --transportSpecific=1 -O 0 -w -ml 7 2&> /var/log/phc2sys.log &
    

    Initiates periodic System clock adjustment from 802.1AS time-domains reference:

    • -s - Specifies the PHC from eth0 interface as the primary clock

    • -c - Specifies the system clock as the secondary clock

    • --transportSpecific - Is required when running phc2sys in a gPTP domain.

    • --step_threshold - Is set so phc2sys converges faster when time jump occurs

    • -w - Makes phc2sys wait until ptp4l is synchronized

    • -ml [1-7] - Enables log messages on standard output

    For more information on the phc2sys configuration option, refer to the phc2sys man page.

Note: Optionally, enp1s0.vlan can be set with Virtual LANs (VLANs) on egress Ethernet L2/Ethernet PTP event message (EtherType 0x88f7) to set hardware queue affinity.

ts2phc

ts2phc is used to synchronize one or more PTP hardware clocks using external timestamps.

Usage: ts2phc -f [configuration file]

Note: Using pulse-per-second (PPS ) and auxiliary timestamping (AUXTS) signals, mapping depends on hardware pin headers being provisioned on the motherboard and the discrete PCIe boards:

Board

PPS

AUXTS/EXTTS

Comments

i210-IT

SDP0

SDP1

Server i210-IT x1 PCIe board

i225-LM

SDP0

SDP1

i226-LM

SDP0

SDP1

TGL-mGBE

J2H4 pin 1

J2H4 pin 6

Tiger Lake UP3 Intel® RVP

EHL-mGBE0

J2E1 Pin 18

J2E1 Pin 9

Elkhart Lake Intel® CRB

EHL-mGBE1

J2E1 Pin 9

J2E1 Pin 11

Elkhart Lake Intel® CRB

ADL-mGBE0

J7H8 pin 3

J7H8 pin 2

Alder Lake Intel® RVP

ADL-mGBE1

J7H8 pin 9

J7H8 pin 8

Alder Lake Intel® RVP

Attention

PPS and AUXTS/EXTTS pins may be physically accessible on commercial off-the-shelf industrial PC products. For more information, contact the hardware vendor.

echo 0 0 0 1 0 > /sys/class/ptp/ptpX/period

Usage: echo <idx> <ts> <tns> <ps> <pns> > /sys/class/ptp/ptpX/period

Where:

  • <idx> - PPS number

  • <ts> - Start time (second), based on PTP time

  • <tns> - Start time (nanosecond), based on PTP time

  • <ps> - Period (s)

  • <pns> - Period (ns)

  • ptpX - PTP device on Ethernet secondary or primary port

Configuration File

The configuration file is divided into sections. Each section starts with a line containing its name enclosed in brackets and followed by settings. Each setting is placed on a separate line and it contains the name of the option and the value separated by whitespace characters. Empty lines and lines starting with # are ignored.

There are two different section types:

  1. Global section (indicated as [global]) sets the program options and default secondary clock options. Other sections are clock-specific sections, and they override the default options.

  2. Secondary clock section provides the name of the configured clock (for example, [eth0]). Secondary clocks specified in the configuration file need not be specified with the -c command line option.

Examples

#
# This example uses a PPS signal from a GPS receiver as an input to
# the SDP0 pin of an Intel i210 card.  The pulse from the receiver has
# a width of 100 milliseconds.
#
# Important!  The polarity is set to "both" because the i210 always
# time stamps both the rising and the falling edges of the input
# signal.
#
[global]
use_syslog      0
verbose         1
logging_level       6
ts2phc.pulsewidth   100000000
[eth6]
ts2phc.channel      0
ts2phc.extts_polarity   both
ts2phc.pin_index    0
#
# This example shows ts2phc keeping a group of three Intel i210 cards
# synchronized to each other in order to form a Transparent Clock.
# The cards are configured to use their SDP0 pins connected in
# hardware.  Here eth3 and eth4 will be slaved to eth6.
#
# Important!  The polarity is set to "both" because the i210 always
# time stamps both the rising and the falling edges of the input
# signal.
#
[global]
use_syslog      0
verbose         1
logging_level       6
ts2phc.pulsewidth   500000000
[eth6]
ts2phc.channel      0
ts2phc.master       1
ts2phc.pin_index    0
[eth3]
ts2phc.channel      0
ts2phc.extts_polarity   both
ts2phc.pin_index    0
[eth4]
ts2phc.channel      0
ts2phc.extts_polarity   both
ts2phc.pin_index    0

Overview of IEEE 802.1Q-2018 Enhancements for Scheduled Traffic (EST)

IEEE 802.1Q-2018 has enforced predictable time of delivery by dividing Ethernet traffic into different classes, thus ensuring that at specific times only one traffic class (or set of traffic classes) has access to the network.

TSN endpoints and TSN bridge need time-aware traffic scheduling to enable quality-of-service (QoS) for time-sensitive stream communication between Talkers and Listeners. To enable time-aware scheduling, bridges support the mechanisms defined in IEEE 802.1Q-2018 Enhancements for Scheduled Traffic (EST) feature (formerly known as 802.1Qbv).

../../../_images/network_topology_802.1Q-EST.png

By introducing a hardware differentiator, Intel® Linux Ethernet controllers supports OT network QoS administration by enforcing Gate Control List (GCL) to define the traffic-queues that are permitted to transmit at a specific time within a control network cycle:

  • Differentiate the traffic between high priority and low priority, or best-effort traffic (PCP)

  • Manage transmission hardware queues that need to be switched ON or OFF according to a global time-aware scheduling-policy, which indicates the duration for which an entry will be active on each port of each network device.

../../../_images/network_GCL_seq_diag.png

Virtual LANs (VLANs)

Virtual LANs (VLANs) QoS is the center pillar in IEEE 802.1Q standard for all endpoints and bridges to support Forward and Queuing Enhancements Time-Sensitive Streams (FQTSS) mechanisms to:

  • Recognize VLAN Priority Information (PCP)

  • Identify Stream Reservation (SR) traffic classes

The VLAN interface is created using the ip-link command from the iproute2 project, which is pre-installed in ECI.

PCP value

Priority

Acronym

Traffic Types

1

0 (lowest)

BK

Background

0

1 (default)

BE

Best effort

2

2

EE

Excellent effort

3

3

CA

Critical applications

4

4

VI

Video, < 100 ms latency and jitter

5

5

VO

Voice, < 10 ms latency and jitter

6

6

IC

Inter-network control

7

7 (highest)

NC

Network control

The egress-qos-map argument defines a mapping of Linux internal packet priority (SO_PRORITY) to VLAN header PCP field for outgoing frames.

$ sudo ip link add link eth0 name eth0.5 type vlan id 5 egress-qos-map 2:2 3:3 && cat /proc/net/vlan/enp5s0.vlan

The following example creates a VLAN interface for traffic-class. Socket egress messages with SO_PRIORITY=2 map to VLAN PCP 2 while those with SO_PRIORITY=3 map to VLAN PCP=3

enp5s0.vlan  VID: 5      REORDER_HDR: 1  dev->priv_flags: 1021
       total frames received            0
        total bytes received            0
    Broadcast/Multicast Rcvd            0

    total frames transmitted            0
     total bytes transmitted            0
Device: enp5s0
INGRESS priority mappings: 0:0  1:0  2:0  3:0  4:0  5:0  6:0 7:0
EGRESS priority mappings: 2:2 3:3

For further information on the command arguments, refer to the ip-link(8) man page.

Linux Traffic Control (TC)

Linux Traffic Control (TC) provides various packet scheduling policies to ensure that the inter-packet transmission latency on Intel® Ethernet Controllers meet deterministically a specific industrial network, user-defined cycle deadline, or both.

Every ECI node is capable of supporting the transmission algorithms specified in the FQTSS chapter of IEEE 802.1Q-2018 via TC Queuing Discipline (QDisc) usages.

Linux Network QDisc presents several offload mechanism options, leveraging multiple Ethernet hardware-queues, to realize 802.1Q-2018 Enhancements for Scheduled Traffic (EST) mechanism (formerly known as 802.1Qbv).

../../../_images/Linux_tc.png

For more information, refer to the TSN Documentation Project for Linux.

Earliest TxTime First (NET_SCHED_ETF) QDisc

While not an actual FQTSS feature, Intel® Ethernet Linux drivers also provide the Earliest TxTime First (ETF) QDisc, which enables the LaunchTime feature present in Intel® Ethernet Controller I210, Intel® Ethernet Controller I225-LM/I226-LM, and TGL mGBE.

In Linux, this hardware feature is enabled through the SO_TXTIME socket and ETF QDisc.

The SO_TXTIME socket option allows applications to configure the transmission time for each frame while ETF QDisc ensures that the frames coming from multiple sockets are sent to the hardware ordered by transmission time.

The following steps describe how an application sends a time-scheduled packet:

  1. Open a raw and low-level packet interface socket:

    socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW);
    
  2. Set the socket priority option (SO_PRIORITY) corresponding to the desired VLAN’s QoS:

    setsockopt(fd, SOL_SOCKET, SO_PRIORITY, &priority, sizeof(priority));
    
  3. Set the socket transmit time option (SO_TXTIME):

    sk_txtime.clockid = CLOCK_TAI;
    sk_txtime.flags = (use_deadline_mode | receive_errors);
    if (setsockopt(sock, SOL_SOCKET, SO_TXTIME,
            &sk_txtime, sizeof(sk_txtime))) {
      exit_with_error("setsockopt SO_TXTIME");
    }
    

    Note: The flags take bit-wise fields: report_error at bit 1 and deadline_mode at bit 0. For details on these fields, refer to Add a new socket option for a future transmit time and Make etf report drops on error_queue .

For every TX packet, the user space application will specify the per-packet transmit time in the socket control message (cmsg) ancillary data before sending it:

struct msghdr msg; //
struct cmsghdr *cmsg;
struct iovec iov;
iov.iov_base = rawpktbuf;                  // the transmit packet
iov.iov_len = sizeof(rawpktbuf);           // the size of the transmit packet
msg.msg_iov = &iov;                        // internal scatter/gather array for transmit packet
cmsg = CMSG_FIRSTHDR(&msg);                // obtain the control message
cmsg->cmsg_level = SOL_SOCKET;             // Set to socket level
cmsg->cmsg_type = SCM_TXTIME;              // Set ancillary data is TXTIME socket control message type
cmsg->cmsg_len = CMSG_LEN(sizeof(__u64));
*((__u64 *) CMSG_DATA(cmsg)) = txtime;     // Set per-packet transmit time
sendmsg(fd, &msg, 0);

Note: A TX packet sent from the user-space copies data when the packet enters the kernel-space. The copied packets are stored in the data buffer that is pointed to by sk_buff, the socket buffer structure inside the Linux kernel that tracks network packets. For additional details, refer to socket interface in the Linux networking subsystem .

Data copying becomes the bottleneck when 100us network cycle with very low-latency injection time is needed.

The ETF QDisc operates on a per-queue basis, so that the either TAPRIO or MQPRIO QDisc configuration is required in addition to exposing the hardware transmission queues.

Both MQPRIO and TAPRIO also define how Linux network priorities map into traffic classes and how traffic classes map into hardware queues.

The following example illustrates queue configuration using the MQPRIO QDisc for Intel® Ethernet Controller I210, which has four transmission queues:

$ sudo tc qdisc add dev eth0 parent root handle 6666 mqprio \
        num_tc 3 \
        map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
        queues 1@0 1@1 2@2 \
        hw 0

After running this command:

  • MQPRIO is installed as root QDisc on the eth0 interface with the handle ID 6666.

  • Three different traffic-classes are defined (from 0 to 2), where Linux priority 3 maps into traffic class 0, Linux priority 2 maps into traffic class 1, and all other Linux priorities map into traffic class 2.

  • Packets belonging to traffic class 0 go into one queue at offset 0 (that, queue index 0 or TXQ0), packet from traffic class 1 go into one queue at offset 1 (that is, queue index 1 or TXQ1), and packets from traffic class 2 go into two queues at offset 2 (that is, queues index 2 and 3, or TXQ2 and TXQ3).

  • No hardware offload is enabled.

Note: By configuring MQPRIO, Stream Reservation (SR) Class A (Priority 3) is enqueued on Q0, the highest priority transmission queue in Intel® Ethernet Controller, while SR Class B (Priority 2) is enqueued on TXQ1, the second priority. All best-effort traffic goes into TXQ2 or TXQ3.

In the following example, the ETF QDisc is installed on TXQ0 and offload feature is enabled, since the Intel® Ethernet Controller I210 driver supports the LaunchTime feature:

$ ethtool -K eth0 hw-tc-offload on
$ sudo tc qdisc add dev eth0 parent 6666:1 etf \
        clockid CLOCK_TAI \
        delta 500000 \
        offload

After running this command:

  • The clockid parameter specifies the clock that is utilized to set the transmission timestamps from frames (only CLOCK_TAI is supported). Moreover, ETF requires the system clock to be in sync with the PTP Hardware Clock.

  • The delta parameter specifies the duration before the transmission timestamp the ETF QDisc sends the frame to hardware. That value depends on multiple factors and can vary from system-to-system. This example uses 500us.

Important

Developer Tips about ethtool -K eth0 hw-tc-offload on:

For more information on command arguments, refer to the tc-etf man page.

Time Aware Priority (NET_SCHED_TAPRIO) QDisc

IEEE 802.1Q-2018 introduces the Enhancements for Scheduled Traffic (EST) feature (formerly known as 802.1Qbv), which allows packet transmission from each Endpoint and Bridge hardware queue to be scheduled relative to a known time-slice Control Gate List (GCL) within the TSN domains.

The Linux GCL abstraction is simple:

  • A gate is associated with each transmission queue (TXQ).

  • The Open or Closed states of the transmission gate determines queued frames policy.

  • Each port is associated with a GCL, which contains an ordered list of gate operations.

../../../_images/Linux-tc-GCL.png

For more details on the EST algorithm, refer to section 8.6.8.4 of IEEE 802.1Q standard.

The EST feature is supported in Linux via the TAPRIO QDisc. Similar to MQPRIO, the QDisc defines how Linux networking stack priorities map into traffic classes and how traffic classes map into hardware queues. This feature also enables you to configure the GCL for a given interface.

../../../_images/Linux-tc-EST-timeview.png

For more information on command arguments, refer to the tc-taprio man page.

Also, refer to the user guides - Intel(R) Ethernet Controller I210 Time-Sensitive Networking (TSN) Linux Reference Software and Intel® Tiger Lake UP3 (TGL) Ethernet MAC Controller Time-Sensitive Networking (TSN) Reference Software.

TAPRIO Enhancements for Scheduled Traffic (EST) TXQ GCL Offload Mode

IEEE 802.1Q-2018 standard introduced both the Enhancements for Scheduled Traffic (EST) (formerly known as 802.1Qbv) hardware features to enforce packet transmission scheduling-policy from each Endpoint and Bridge hardware queue within the predefined TSN Domains cycle-time, possibly also preempted in-between the time-window transition.

ethtool -K eth0 hw-tc-offload on
BASE=$(expr $(date +%s) + 5)000000000
echo "$BASE"
tc qdisc add dev eth0 parent root handle 100 taprio \
num_tc 4 \
map 0 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 \
queues 1@0 1@1 1@2 1@3 \
base-time $BASE \
sched-entry S 01 5000000 \
sched-entry S 02 5000000 \
sched-entry S 04 5000000 \
sched-entry S 08 5000000 \
sched-entry S 00 5000000 \
flags 0x2 \
txtime-delay 0 \

Following parameters are added to support GCL offload-mode:

  • flags - Enables GCL hwoffload value=0x2 capability and TX and RX of traffic-shaping frames automatically (Note: These are transparent to the user or kernel).

  • txtime-delay - The value zero indicates that the packet scheduling and preemption are entirely hardware managed.

  • preempt - Enables FPE hwoffload queue-bitmask (for example 1110 TXQ[0-2] are preemptible). TXQ[3] is an express queue and not preemptible (Note: These are transparent to the user or kernel).

IEEE 802.1Qbu Frame-Preemption FPE hwoffload use TAPRIO flags GCL offload-mode tc qdisc .. taprio ... preempt 1110

Linux traffic-class software defines hardware queues to be set as preemptible and other as express (e.g. non-preemptible) :

On Intel® Atom™ x6000 Series [Elkhart Lake] Ethernet GbE Time-Sensitive Network Controller [Ethernet PCI 8086:4b32 and 8086:4ba0].

Intel® Ethernet Controller Linux driver supports IEEE 802.1Qbu Frame-Preemption TXQ[0-3] configuration via SET_AND_HOLD and SET_AND_RELEASE GCL hardware offload.

ethtool -K enp0s29f1 hw-tc-offload on
BASE=$(expr $(date +%s) + 5)000000000
echo "$BASE"
tc qdisc add dev enp0s29f1 parent root handle 100 taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time $BASE \
sched-entry S 01 5000000 \
sched-entry H 01 5000000 \
sched-entry R 02 5000000 \
flags 0x2 \
txtime-delay 0 \
preempt 1110

High priority traffic-class (TC) would map to non-preemptible queues (example, set to Queue 0) as express queue packets will otherwise not be preempted. TXQ[0] is express by default (.e.g non-preemptible) other TXQ[1-7] are preemptible.

qdisc taprio 100: root refcnt 9 tc 8 map 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0
queues offset 0 count 1 offset 1 count 1 offset 2 count 1 offset 3 count 1 offset 4 count 1 offset 5 count 1 offset 6 count 1 offset 7 count 1
preemptible 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
clockid invalid flags 0x2       base-time 1694087908000000000 cycle-time 1000000 cycle-time-extension 0
        index 0 cmd S gatemask 0x1 interval 500000
        index 1 cmd H gatemask 0x1 interval 500000
        index 2 cmd R gatemask 0x2 interval 500000

On Intel® Ethernet Controller I225-LM for Time-Sensitive Networking (TSN) [Ethernet PCI 8086:15f2] and Intel® Ethernet Controller I226-LM for Time-Sensitive Networking (TSN) [Ethernet PCI 8086:125b]

Intel® Ethernet Controller Linux driver supports IEEE 802.1Qbu Frame-Preemption TXQ[0-3] configuration via bitmask e.g. it does NOT support SET_AND_HOLD and SET_AND_RELEASE GCL based hardware offload.

ethtool -K enp2s0 hw-tc-offload on
ethtool --set-frame-preemption enp2s0 fp on
BASE=$(expr $(date +%s) + 5)000000000
echo "$BASE"
tc qdisc add dev enp2s0 parent root handle 100 taprio \
num_tc 4 \
map 0 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 \
queues 1@0 1@1 1@2 1@3 \
base-time $BASE \
sched-entry S 01 5000000 \
sched-entry S 0e 5000000 \
flags 0x2 \
txtime-delay 0 \
preempt 0001

Note

To double-check GCL FPE offload-mode is enabled :

$ ethtool --show-frame-preemption enp2s0 fp on
Frame preemption settings for enp2s0:
        enabled: enabled
        additional fragment size: 68
        verified: 0
        verification disabled: 1

Low priority traffic-class (TC) would map to preemptible queues (example, set to Queue 1-3) where as express queue packets.

qdisc taprio 100: root refcnt 5 tc 4 map 0 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3
queues offset 0 count 1 offset 1 count 1 offset 2 count 1 offset 3 count 1
preemptible 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
clockid invalid flags 0x2       base-time 1694101561000000000 cycle-time 10000000 cycle-time-extension 0
        index 0 cmd S gatemask 0x1 interval 5000000
        index 1 cmd S gatemask 0xe interval 5000000

Important

Consider the known limitations of TSQ GCL offload-mode:

Unexplicitly defined TSQ GCL

A GCL without explicit TSQ definition Intel®I22x-LM/igc offload-mode apply nothing for that user-undefined gate. So, the behavior falls back onto the Intel®I22x-LM default GCL ALL Opened Gates. For more details, refer to the implementation of the igc_save_qbv_schedule and igc_tsn_clear_schedule functions in kernel-source/drivers/net/ethernet/intel/igc/igc_main.c.

The workaround is to always explicitly define gate behavior in TAPRIO. You would expect that all traffic will be blocked because no gate was set to open.

qdisc taprio 100: root refcnt 5 tc 4 map 0 0 0 1 0 3 0 2 0 0 0 0 0 0 0 0
queues offset 0 count 1 offset 1 count 1 offset 2 count 1 offset 3 count 1
clockid invalid flags 0x2 base-time 1658915097000000000 cycle-time 2500000 cycle-time-extension 0
index 0 cmd S gatemask 0 interval 2500000

However, enabling verbose debug shows that traffic-class is not blocked according to the EST TXQ GCL polices.

rmmod igc
insmod /lib/modules/$(uname -r)/kernel/drivers/net/ethernet/intel/igc/igc.ko debug=16

To block traffic-class according to the EST TXQ GCL polices, define 0 time explicitly for all gates. Then, to pass the checking of total time slot equal cycle time, define the rest of cycle time to NULL gate.

qdisc taprio 100: root refcnt 5 tc 4 map 0 0 0 1 0 3 0 2 0 0 0 0 0 0 0 0
queues offset 0 count 1 offset 1 count 1 offset 2 count 1 offset 3 count 1
clockid invalid flags 0x2 base-time 1658915097000000000 cycle-time 2500000 cycle-time-extension 0
    index 0 cmd S gatemask 0xf interval 0
    index 1 cmd S gatemask 0x0 interval 250000

Packet drop with TxLaunchTime (SO_TXTIME)

Beside ETF QDisc, TAPRIO QDisc also checks if the TxTime` field of the packet from a socket with SO_TXTIME has timed out. If the TxTime is not located in any GCL transmission window, TAPRIO QDisc policy drops the L2 packet. For more details, refer to the implementation of the find_entry_to_transmit function in kernel-source/net/sched/sch_taprio.c.

The workaround is that if the socket is of SO_TXTIME type, set the TxLaunchTime of packet being sent. Always locate the TxLaunchTime in a transmission window.

MAC does not inherit TxLaunchTime from the application directly (SO_TXTIME)

Intel®I22x-LM/igc calculates LaunchTime according to the equation: LaunchTime = TxTime - BaseT - StQT[q]. Linux upstream igc: Fix sending packets too early (fix wrong equation LaunchTime = TxTime - BaseT) has only Linux v5.15 backport. Consequently MAC will postpone the packet transmission for StQT[q] duration and possibly miss the right transmission window. For more details, refer to the implementation of the igc_tx_launchtime function in kernel-source/drivers/net/ethernet/intel/igc/igc_main.c.

As a workaround, for Linux Intel v5.10/lts, use ONLY the first GCL transmission window for transmitting packet with SO_TXTIME, until upstream driver is fixed.

Inaccurate reported TxLaunchTime

tcpdump only applies timestamp while the packet leaves the QDisc, but not from the Ethernet HMAC. So, the time is earlier than the effective TxLaunchTime.

As a workaround, report the most precise timestamping by enabling the SOF_TIMESTAMPING_TX_HARDWARE socket option. The timestamp is the epoch time of the L2 packet when leaving HMAC.

For support on limitations and open issues, contact Intel Support (Log in with Intel® account) and fill a new Intel Edge Software Recipes Case under the Category Software/Driver/OS and the Subcategory Industrial Edge Control Software.

TAPRIO Enhancements for Scheduled Traffic (EST) TXQ GCL Assisted-mode

Certain Intel® Ethernet Controllers do not provide the GCL hw-tc-offload feature.

However, Linux 802.1Q-2018 EST can still be leveraged on a ECI node since it provides the latest version of the TAPRIO QDisc with the SO_TXTIME assisted-mode, which combines skb_data with SO_TXTIME as provided by the ETF QDisc at kernel-level (reduce packets scheduling corner-cases).

In the SO_TXTIME assisted-mode, the LaunchTime feature in the Intel® Ethernet Controller I210 is used to schedule packet transmissions, emulating the EST algorithm.

For all the skb_data packets that do not have the SO_TXTIME field set, Taprio QDisc will:

  • Set the transmit timestamp (set skb->tstamp).

  • Ensure that the transmit time for the packet is set to when the gate is open.

  • Validate whether the timestamp (in skb->tstamp) occurs when the gate corresponding to skb’s traffic class is open (when SO_TXTIME is set).

This mechanism reduces the risk in Intel® Ethernet Controller for time non-critical packets being transmitted outside of their timeslice due to induced delay in the 802.3 and PHY or high-priority hardware queues starving the low-priority queues.

ethtool -K eth0 hw-tc-offload on
BASE=$(expr $(date +%s) + 5)000000000
echo "$BASE"
tc -d qdisc replace dev enp1s0 parent root handle 100 taprio \
num_tc 4 \
map 0 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 \
queues 1@0 1@1 1@2 1@3 \
base-time $BASE \
sched-entry S 01 5000000 \
sched-entry S 02 5000000 \
sched-entry S 04 5000000 \
sched-entry S 08 5000000 \
sched-entry S 00 5000000 \
clockid CLOCK_TAI \
flags 0x1 \
txtime-delay 5000000

tc qdisc replace dev enp1s0 parent 100:1 \
etf clockid CLOCK_TAI delta 5000000 \
offload  skip_sock_check

Following parameters are added to support Enhancements for Scheduled Traffic (EST) TXQ GCL SO_TXTIME assisted-mode:

  • flags- Enables Enhancements for Scheduled Traffic (EST) SO_TXTIME assisted-mode value=0x1 for kernel to TXTIME timestamp on every egress packet (that is, skb_data) DMA descriptor sent through ETF hardware offload queues.

  • txtime-delay - Indicates the minimum time it will take for the packet to hit the wire. This is useful in determining whether you can transmit the packet in the remaining time the gate corresponding to the packet is currently open.

Important

Consider the known limitations of TXQ GCL SO_TXTIME assisted-mode:

MAC does not inherit TxTime from application directly (SO_TXTIME)

Intel®I22x-LM/igc calculates LaunchTime according to the equation: LaunchTime = TxTime - BaseT - StQT[q]. However, the current Linux v5.15 and v5.10 upstream igc driver uses a wrong equation: LaunchTime = TxTime - BaseT. Consequently, MAC will postpone packet transmission for StQT[q] duration and possibly miss the right transmission window. For more details, refer to the implementation of the igc_tx_launchtime function in kernel-source/drivers/net/ethernet/intel/igc/igc_main.c.

As a workaround, use ONLY the first GCL transmission window for packet with SO_TXTIME transmission, until upstream driver is fixed.

Inaccuracy reported TxLaunchTime

tcpdump only applies timestamp while the packet leaves the QDisc, but not from the Ethernet HMAC. So, the time is earlier than the effective TxLaunchTime.

As a workaround, report the most precise timestamping by enabling the SOF_TIMESTAMPING_TX_HARDWARE socket option. The timestamp is the epoch time of the L2 packet when leaving HMAC.

TAPRIO Enhancements for Scheduled Traffic (EST) TXQ GCL Software-fallback

This mechanism reduces the risk in Intel® Ethernet Controller for time non-critical packets being transmitted outside of their timeslice due to induced delay in the 802.3 and PHY or high-priority hardware queues starving the low-priority queues.

BASE=$(expr $(date +%s) + 5)000000000
echo "$BASE"
tc qdisc add dev eth0 parent root handle 100 taprio \
num_tc 4 \
map 0 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 \
queues 1@0 1@1 1@2 1@3 \
base-time $BASE \
sched-entry S 01 5000000 \
sched-entry S 02 5000000 \
sched-entry S 04 5000000 \
sched-entry S 08 5000000 \
sched-entry S 00 5000000 \
clockid CLOCK_TAI \
flags 0x0 \
txtime-delay 5000000

Following two parameters are added to support Enhancements for Scheduled Traffic (EST) software fallback :

  • flags - Enables Enhancements for Scheduled Traffic (EST) value=0x0 to scheduled packet queueing using Linux high-resolution timer-interrupt and soft IRQ kernel work-queues.

  • txtime-delay - Indicates the minimum time it will take for the packet to hit the wire. This is useful in determining whether you can transmit the packet in the remaining time the gate corresponding to the packet is currently open.

Important

Developer tips on ethtool -K eth0 hw-tc-offload on:

Packet Classifier (CONFIG_NET_CLS_FLOWER)

Traffic-Class (TC) Flower classifier allows matching packets against pre-defined flow key fields:

  • Packet headers: f.e. IPv6 source address

  • Tunnel metadata: f.e. Tunnel Key ID

  • Metadata: Input port

Flower classifier actions allow packet to be modified, forwarded, dropped, and so on:

  • pedit: Modify packet data

  • mirrored: Output packet

  • VLAN: Push, pop or modify VLAN

Hardware packets filters are used to achieve the lowest ingress traffic latency using on Intel® Ethernet controllers.

  1. Enable netdev hardware Filter offload capabilities :

    $ ethtool -K eth0 hw-tc-offload on
    $ ethtool -K eth0 ntuple-filters on
    

For example, Intel® Ethernet Controller I210-IT for Time-Sensitive Networking (TSN) steering ingress traffic RXQ[1] Filter by EtherType UADP ETH (EtherType 0xb62c) at Ethernet L2 frame-level hardware can be set using iproute2 tc filter ... flower command:

  1. Set skip_sw to add to the hardware Filters (by default skip_hw otherwise) :

    $ tc filter add dev eth0 parent ffff: proto 0xb62c flower \
         src_mac cc:cc:cc:cc:cc:cc \
         hw_tc 1 skip_sw
    

    To show traffic control applied ingress filter

    $ tc filter show dev ethO ingress
    

    The output contents would reveal in_hw or not_in_hw to confirm that skip_sw rule is effectively applied in in_hw hardware offload.

    filter parent ffff. protocol ip
    pref 49152 flower chain 0
    handle 0x1
    eth_type ipv4
    Ip_ proto sctp
    dst port 80
    skip_sw
    in_hw
    

Another example, Intel® Atom™ x6000 Series [Elkhart Lake] Ethernet GbE Time-Sensitive Network Controller [Ethernet PCI 8086:4b32 and 8086:4ba0] steering ingress traffic RXQ[2] Filter by EtherType PTPv2 (EtherType 0x88f7) at Ethernet L2 frame-level hardware can be set using iproute2 tc filter ... flower command:

  1. Set another hardware Filters for steering all ingress PTPv2-messages to traffic-class 2 :

    $ tc filter add dev eno1 parent ffff: protocol 0x88f7 flower \
        hw_tc 2 skip_sw
    

    To show traffic control applied ingress filter

    $ tc filter show dev eno1 ingress
    

    The output contents would reveal in_hw or not_in_hw to confirm that skip_sw rule is effectively applied in in_hw hardware offload.

    filter parent ffff: protocol [35063] pref 49152 flower chain 0 handle 0x1 hw_tc 2
    eth_type 88f7
    in_hw in_hw_count 1
    

Final example for Intel® Ethernet Controller I225-LM for Time-Sensitive Networking (TSN) [Ethernet PCI 8086:15f2] and Intel® Ethernet Controller I226-LM for Time-Sensitive Networking (TSN) [Ethernet PCI 8086:125b] where tc filter ... flower capabilities are limited for complex ingress traffic scenario or not supported in Intel® Ethernet controller, use ethtool -U flow-type as an alternative.

  1. Set flow type Filter to steer onto RXQ[3] all UADP ETH EtherType 0xb62c ingress Layer 2 Ethernet frames :

    $ ethtool -U enp3s0 flow-type ether proto 0xb62c queue 3 && ethtool -u enp3s0
    

    The output contents should be similar to the following:

    4 RX rings available
    Total 1 rules
    
    Filter: 63
            Flow Type: Raw Ethernet
            Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Ethertype: 0xB62C mask: 0x0
            Action: Direct to queue 3
    
  2. Set flow type filter to steer onto RXQ[2] L2/PTPv2 messages EtherType 0x88f7 ingress Layer 2 Ethernet frames :

    $ ethtool -U enp3s0 flow-type ether proto 0x88f7 queue 2  && ethtool -u enp3s0
    

    The output contents should be similar to the following:

    Added rule with ID 62
    4 RX rings available
    Total 2 rules
    
    Filter: 62
            Flow Type: Raw Ethernet
            Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Ethertype: 0x88F7 mask: 0x0
            Action: Direct to queue 2
    
    Filter: 63
            Flow Type: Raw Ethernet
            Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Ethertype: 0xB62C mask: 0x0
            Action: Direct to queue 3
    
  3. Set flow type Filter to steer onto RXQ[3] all VLAN tagged with PCP=3 (see IEEE_802.1Q header format ) ingress Layer 2 Ethernet frames :

    $ ethtool -U enp3s0 flow-type ether proto 0x8100 vlan 0x6000 m 0x1FFF queue 3 && ethtool -u enp3s0
    

    The output contents should be similar to the following:

    Added rule with ID 61
    4 RX rings available
    Total 3 rules
    
    Filter: 61
            Flow Type: Raw Ethernet
            Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Ethertype: 0x8100 mask: 0x0
            Action: Direct to queue 3
    
    Filter: 62
            Flow Type: Raw Ethernet
            Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Ethertype: 0x88F7 mask: 0x0
            Action: Direct to queue 2
    
    Filter: 63
            Flow Type: Raw Ethernet
            Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Ethertype: 0xB62C mask: 0x0
            Action: Direct to queue 3
    
  • Steer to RXQ[0] all ingress packet from a specified MAC source address for example cc:cc:cc:cc:cc:cc:

    $ ethtool -U enp3s0 flow-type ether src cc:cc:cc:cc:cc:cc queue 0 && ethtool -u enp3s0
    

    The output contents should be similar to the following:

    Added rule with ID 60
    4 RX rings available
    Total 4 rules
    
    Filter: 60
            Flow Type: Raw Ethernet
            Src MAC addr: CC:CC:CC:CC:CC:CC mask: 00:00:00:00:00:00
            Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Ethertype: 0x0 mask: 0xFFFF
            Action: Direct to queue 0
    
    Filter: 61
            Flow Type: Raw Ethernet
            Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Ethertype: 0x8100 mask: 0x0
            Action: Direct to queue 3
    
    Filter: 62
            Flow Type: Raw Ethernet
            Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Ethertype: 0x88F7 mask: 0x0
            Action: Direct to queue 2
    
    Filter: 63
            Flow Type: Raw Ethernet
            Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
            Ethertype: 0xB62C mask: 0x0
            Action: Direct to queue 3
    

Important

Developer tips about ethtool -K eth0 hw-tc-offload on and ethtool -K eth0 ntuple-filters on:

For more information on command arguments, refer to the tc-flower and ethtool man pages.