Attention

You are viewing an older version of the documentation. The latest version is v3.3.

Enable Intel® Level Zero and OpenCL™ Graphics Compute Runtime

This tutorial explains the procedure to enable Intel® Level Zero and OpenCL™ Graphics Compute Runtime with Intel® ECI.

  1. Setup the ECI repository.

  2. Install one of the available ECI Linux kernels. For this example, we installed linux-intel-rt:

    $ sudo apt install linux-intel-rt
    
    $ apt-cache search linux-intel | grep -v dbg
    linux-intel-acrn-sos - intel-acrn-sos Linux kernel, version 5.10.140-linux-intel-acrn-sos+
    linux-intel-rt - intel-rt Linux kernel, version 5.10.140-rt73-intel-ese-standard-lts-rt+
    linux-intel-xenomai - intel-xenomai Linux kernel, version 5.10.140-intel-ese-standard-lts-dovetail+
    
  3. Install GuC and HuC Linux firmware available from the ECI repository to follow both the official Linux distribution package release and Out-of-Tree (OOT) SR-IOV Physical Function (PF) mode support.

    $ sudo apt install firmware-misc-nonfree firmware-linux
    

    For example, on Debian 12 (Bookworm) distribution, using the ECI repository should display:

    $ sudo apt-cache policy firmware-misc-nonfree
    firmware-misc-nonfree:
    Installed: 20230210-4-intel-iotg
    Candidate: 20230210-4-intel-iotg
    Version table:
    *** 20230210-4-intel-iotg 1000
          1000 https://eci.intel.com/repos/bullseye isar/main amd64 Packages
          100 /var/lib/dpkg/status
       20210315-3 500
          500 http://deb.debian.org/debian bullseye/non-free amd64 Packages
    
  4. Enable GuC/HuC firmware loading by adding the i915 kernel module input parameter:

    $ echo "options i915 enable_guc=3" | sudo tee -a /etc/modprobe.d/i915.conf
    
  5. After the installation is complete, reboot the system:

    $ sudo reboot
    

    When the system reboots, Intel® Graphics HuC / GuC should be enabled.

    ../../_images/Desktop_012.png

    Verify that the GuC / HuC modules are enabled:

    $ dmesg | grep i915
    
    ...
    [    5.633743] i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.0.3.bin version 70.0
    [    5.650994] i915 0000:00:02.0: [drm] GuC submission enabled
    [    5.650996] i915 0000:00:02.0: [drm] GuC SLPC enabled
    [    5.651344] i915 0000:00:02.0: [drm] GuC RC: enabled
    ...
    [    5.633745] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9
    [    5.649831] i915 0000:00:02.0: [drm] HuC authenticated
    
  6. Install the Debian community version of Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ driver

    Check the Intel® Graphics Compute Runtime version:

    $ apt-cache policy intel-opencl-icd
    

    In the following example, the Intel® Graphics Compute Runtime corresponds to the version 23.13.26032.7-1 maintained by Debian maintainer:

    intel-opencl-icd:
    Installed: (none)
    Candidate: 23.13.26032.7-1
    Version table:
        23.13.26032.7-1 1000
            1000 https://eci.intel.com/repos/<distribution> isar/main amd64 Packages
    
    $ sudo apt install libgtk-3-0 libgl1 libtinfo5 clinfo libze1 libigdgmm12 libigc1 libigdfcl1 intel-opencl-icd
    
  7. Add the current user to the video,render groups:

    $ sudo usermod -a -G video,render $USER
    $ clinfo | head
    
    Number of platforms                               1
    Platform Name                                   Intel(R) OpenCL HD Graphics
    Platform Vendor                                 Intel(R) Corporation
    Platform Version                                OpenCL 3.0
    Platform Profile                                FULL_PROFILE
    Platform Extensions                             cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_subgroup_local_block_io
    Platform Extensions with Version                cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                    cl_khr_fp16                                                      0x400000 (1.0.0)
                                                    cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                    cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
    
  8. Install Intel® OpenCL™ and Level-Zero GPU offloading sample applications and tracers part of Intel® Profiling Tools Interfaces for GPU (PTI for GPU):

    $ sudo apt install intel-pti-gpu-samples intel-pti-gpu-tracers
    
  9. Perform a test using General Matrix Multiplication (GEMM) on GPU device Intel® OpenCL™ kernels:

    $ /opt/intel/pti-gpu/samples/cl_gemm
    
    OpenCL Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times)
    Target device: Intel(R) UHD Graphics 620
    Matrix multiplication time: 0.0853559 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Matrix multiplication time: 0.0852803 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Matrix multiplication time: 0.0852979 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Matrix multiplication time: 0.0852523 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Total execution time: 2.17606 sec
    

    or profile a General Matrix Multiplication (GEMM) on GPU device Intel® OpenCL™ using tools such as:

    $ cl_hot_kernels /opt/intel/pti-gpu/samples/cl_gemm
    
    OpenCL Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times)
    Target device: Intel(R) UHD Graphics 620
    Matrix multiplication time: 0.0852973 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Matrix multiplication time: 0.0852284 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Matrix multiplication time: 0.0853457 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Matrix multiplication time: 0.0850899 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Total execution time: 1.05919 sec
    
    === Device Timing Results: ===
    
    Total Execution Time (ns): 1070330569
    Total Device Time for CPU (ns): 0
    Total Device Time for GPU (ns): 340961331
    
    == GPU Backend: ==
    
        Kernel,       Calls, SIMD,           Time (ns),  Time (%),        Average (ns),            Min (ns),            Max (ns)
        GEMM,           4,   32,           340961331,    100.00,            85240332,            85089916,            85345666
    
  10. Perform a test using General Matrix Multiplication (GEMM) on GPU device Intel® Level-Zero:

    Important

    Consider the known limitations of Intel® Atom™ x3000 Series [Apollo Lake] Gen9 Graphics and Intel® Atom™ x6000 Series [Elkhart Lake] Gen11 Graphics that do NOT support Intel oneAPI Level-Zero (Only OpenCL API is supported). For more information see Limitation #6 - Fail when calling Intel Level0 API:

    $ ze_info
    

    When Intel® Level0 API is NOT supported by Intel® HD Graphics, the following error message will be received:

    ze_info: ./samples/ze_info/main.cc:382: int main(int, char**): Assertion `status == ZE_RESULT_SUCCESS' failed.
    
    $ /opt/intel/pti-gpu/samples/ze_gemm
    
    Level Zero Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times)
    Target device: Intel(R) UHD Graphics 620
    Matrix multiplication time: 0.0858843 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Matrix multiplication time: 0.0858047 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Matrix multiplication time: 0.0859173 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Matrix multiplication time: 0.0859335 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Total execution time: 0.569975 sec
    

    or profile a General Matrix Multiplication (GEMM) on GPU device Intel® Level-Zero kernels using tools such as:

    $ ze_hot_kernels /opt/intel/pti-gpu/samples/ze_gemm
    
    Level Zero Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times)
    Target device: Intel(R) UHD Graphics 620
    Matrix multiplication time: 0.085909 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Matrix multiplication time: 0.085957 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Matrix multiplication time: 0.0859215 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Matrix multiplication time: 0.0859638 sec
    Results are CORRECT with accuracy: 4.90573e-06
    Total execution time: 0.551349 sec
    
    === Device Timing Results: ===
    
    Total Execution Time (ns): 562776210
    Total Device Time (ns): 343751333
    
        Kernel,       Calls, SIMD,           Time (ns),  Time (%),        Average (ns),            Min (ns),            Max (ns)
        GEMM,           4,   32,           343751333,    100.00,            85937833,            85909000,            85963833
    

Enable Intel® oneAPI DPC++ and OpenMP Compute Runtimes

This tutorial explains the procedure to enable Intel® oneAPI Data Parallel C++ (DPC++) and OpenMP runtimes with Intel® ECI.

For full details please refer to oneAPI DPC++ Compiler and Runtime architecture design.

  1. Setup the Intel® Graphics Level Zero and OpenCL™ GPU Compute Runtime.

  2. Setup the Intel® oneAPI repository to install Intel® oneAPI runtime libraries.

  3. Install Intel® GPU offloading Intel® oneAPI sample applications part of Intel® Profiling Tools Interfaces for GPU (PTI for GPU):

    $ sudo apt install intel-pti-gpu-samples-oneapi
    
  4. Perform a test using General Matrix Multiplication (GEMM) on GPU device using Intel® oneAPI DPC++ compile runtime:

    $ /opt/intel/pti-gpu/samples/dpc_gemm
    
    DPC++ Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times)
    Target device: Intel(R) UHD Graphics 620
    Matrix multiplication time: 0.0852757 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Matrix multiplication time: 0.0846759 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Matrix multiplication time: 0.0848136 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Matrix multiplication time: 0.084784 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Total execution time: 0.558759 sec
    

    or profile a General Matrix Multiplication (GEMM) on GPU device Intel® Level-Zero kernels using tools such as:

    $ ze_hot_kernels /opt/intel/pti-gpu/samples/dpc_gemm
    
    DPC++ Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times)
    Target device: Intel(R) UHD Graphics 620
    Matrix multiplication time: 0.0851681 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Matrix multiplication time: 0.0848311 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Matrix multiplication time: 0.0854751 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Matrix multiplication time: 0.0846437 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Total execution time: 0.559788 sec
    
    === Device Timing Results: ===
    
    Total Execution Time (ns): 616853516
    Total Device Time (ns): 341483998
    
                                                                                                Kernel,       Calls, SIMD,           Time (ns),  Time (%),        Average (ns),            Min (ns),            Max (ns)
    _ZTSZZL11RunAndCheckN2cl4sycl5queueERKSt6vectorIfSaIfEES6_RS4_jfENKUlRNS0_7handlerEE_clES9_E6__GEMM,           4,   32,           341483998,    100.00,            85370999,            84983666,            85818333
    
  5. Perform a test using General Matrix Multiplication (GEMM) on GPU device using Intel® oneAPI OpenMP library runtime:

    $ /opt/intel/pti-gpu/samples/omp_gemm
    
    OpenMP Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times)
    Target device: GPU
    Matrix multiplication time: 1.12189 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Matrix multiplication time: 0.0874556 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Matrix multiplication time: 0.0874651 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Matrix multiplication time: 0.0875799 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Total execution time: 1.38484 sec
    

    or profile a General Matrix Multiplication (GEMM) on GPU device Intel® OpenMP using tools such as:

    $ OMP_TOOL_LIBRARIES=/usr/lib/x86_64-linux-gnu/libomp_hot_regions.so /opt/intel/pti-gpu/samples/omp_gemm
    
    [INFO] OMP Runtime Version: Intel(R) OMP version: 5.0.20220623
    OpenMP Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times)
    Target device: GPU
    Matrix multiplication time: 1.12661 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Matrix multiplication time: 0.0874766 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Matrix multiplication time: 0.0876354 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Matrix multiplication time: 0.0875138 sec
    Results are CORRECT with accuracy: 4.88761e-06
    Total execution time: 1.38942 sec
    
    === OpenMP Timing Results: ===
    
    Total Execution Time (ns): 1523530801
    Total Region Time (ns): 365737023
    
            Region ID,         Region Type,       Calls, Transferred (bytes),           Time (ns),  Time (%),        Average (ns),            Min (ns),            Max (ns)
        139780423450449,              Target,           4,                   0,           356552629,     97.49,            89138157,            86490717,            96874554
        139780423450450,    TransferToDevice,           8,            33554432,             6250197,      1.71,              781274,              555161,             1221406
        139780423450451,  TransferFromDevice,           4,            16777216,             2934197,      0.80,              733549,              726679,              748623