Enable Intel® Level Zero and OpenCL™ Graphics Compute Runtime¶
This tutorial explains the procedure to enable Intel® Level Zero and OpenCL™ Graphics Compute Runtime with Intel® ECI.
Install one of the available ECI Linux kernels. For this example, we installed
linux-intel-rt
:$ sudo apt install linux-intel-rt
$ apt-cache search linux-intel | grep -v dbg linux-intel-acrn-sos - intel-acrn-sos Linux kernel, version 5.10.140-linux-intel-acrn-sos+ linux-intel-rt - intel-rt Linux kernel, version 5.10.140-rt73-intel-ese-standard-lts-rt+ linux-intel-xenomai - intel-xenomai Linux kernel, version 5.10.140-intel-ese-standard-lts-dovetail+
Install GuC and HuC Linux firmware available from the ECI repository to follow both the official Linux distribution package release and Out-of-Tree (OOT) SR-IOV Physical Function (PF) mode support.
$ sudo apt install firmware-misc-nonfree firmware-linux
For example, on Debian 12 (Bookworm) distribution, using the ECI repository should display:
$ sudo apt-cache policy firmware-misc-nonfree firmware-misc-nonfree: Installed: 20230210-4-intel-iotg Candidate: 20230210-4-intel-iotg Version table: *** 20230210-4-intel-iotg 1000 1000 https://eci.intel.com/repos/bullseye isar/main amd64 Packages 100 /var/lib/dpkg/status 20210315-3 500 500 http://deb.debian.org/debian bullseye/non-free amd64 Packages
$ sudo apt install linux-firmware
For example, on Canonical® Ubuntu® 22.04 (Jammy Jellyfish) distribution, using the ECI repository should display:
$ sudo apt-cache policy linux-firmware linux-firmware: Installed: 20220329.git681281e4-0ubuntu3.18-intel-iotg.eci2 Candidate: 20220329.git681281e4-0ubuntu3.18-intel-iotg.eci2 Version table: 20220329.git681281e4-0ubuntu3.18-intel-iotg.eci2 1000 1000 https://eci.intel.com/repos/jammy isar/main amd64 Packages *** 20220329.git681281e4-0ubuntu3.18-intel-iotg.eci2 100 100 /var/lib/dpkg/status 20220329.git681281e4-0ubuntu1 500 500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages
Enable GuC/HuC firmware loading by adding the i915 kernel module input parameter:
$ echo "options i915 enable_guc=3" | sudo tee -a /etc/modprobe.d/i915.conf
After the installation is complete, reboot the system:
$ sudo reboot
When the system reboots, Intel® Graphics HuC / GuC should be enabled.
Verify that the GuC / HuC modules are enabled:
$ dmesg | grep i915
... [ 5.633743] i915 0000:00:02.0: [drm] GuC firmware i915/tgl_guc_70.0.3.bin version 70.0 [ 5.650994] i915 0000:00:02.0: [drm] GuC submission enabled [ 5.650996] i915 0000:00:02.0: [drm] GuC SLPC enabled [ 5.651344] i915 0000:00:02.0: [drm] GuC RC: enabled ... [ 5.633745] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9 [ 5.649831] i915 0000:00:02.0: [drm] HuC authenticated
Install the Debian community version of Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ driver
Check the Intel® Graphics Compute Runtime version:
$ apt-cache policy intel-opencl-icd
In the following example, the Intel® Graphics Compute Runtime corresponds to the version 23.13.26032.7-1 maintained by Debian maintainer:
intel-opencl-icd: Installed: (none) Candidate: 23.13.26032.7-1 Version table: 23.13.26032.7-1 1000 1000 https://eci.intel.com/repos/<distribution> isar/main amd64 Packages
$ sudo apt install libgtk-3-0 libgl1 libtinfo5 clinfo libze1 libigdgmm12 libigc1 libigdfcl1 intel-opencl-icd
Add the current user to the
video,render
groups:$ sudo usermod -a -G video,render $USER $ clinfo | head
Number of platforms 1 Platform Name Intel(R) OpenCL HD Graphics Platform Vendor Intel(R) Corporation Platform Version OpenCL 3.0 Platform Profile FULL_PROFILE Platform Extensions cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info cl_intel_subgroup_local_block_io Platform Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0) cl_khr_fp16 0x400000 (1.0.0) cl_khr_global_int32_base_atomics 0x400000 (1.0.0) cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
Install Intel® OpenCL™ and Level-Zero GPU offloading sample applications and tracers part of Intel® Profiling Tools Interfaces for GPU (PTI for GPU):
$ sudo apt install intel-pti-gpu-samples intel-pti-gpu-tracers
Perform a test using General Matrix Multiplication (GEMM) on GPU device Intel® OpenCL™ kernels:
$ /opt/intel/pti-gpu/samples/cl_gemm
OpenCL Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times) Target device: Intel(R) UHD Graphics 620 Matrix multiplication time: 0.0853559 sec Results are CORRECT with accuracy: 4.90573e-06 Matrix multiplication time: 0.0852803 sec Results are CORRECT with accuracy: 4.90573e-06 Matrix multiplication time: 0.0852979 sec Results are CORRECT with accuracy: 4.90573e-06 Matrix multiplication time: 0.0852523 sec Results are CORRECT with accuracy: 4.90573e-06 Total execution time: 2.17606 sec
or profile a General Matrix Multiplication (GEMM) on GPU device Intel® OpenCL™ using tools such as:
$ cl_hot_kernels /opt/intel/pti-gpu/samples/cl_gemm
OpenCL Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times) Target device: Intel(R) UHD Graphics 620 Matrix multiplication time: 0.0852973 sec Results are CORRECT with accuracy: 4.90573e-06 Matrix multiplication time: 0.0852284 sec Results are CORRECT with accuracy: 4.90573e-06 Matrix multiplication time: 0.0853457 sec Results are CORRECT with accuracy: 4.90573e-06 Matrix multiplication time: 0.0850899 sec Results are CORRECT with accuracy: 4.90573e-06 Total execution time: 1.05919 sec === Device Timing Results: === Total Execution Time (ns): 1070330569 Total Device Time for CPU (ns): 0 Total Device Time for GPU (ns): 340961331 == GPU Backend: == Kernel, Calls, SIMD, Time (ns), Time (%), Average (ns), Min (ns), Max (ns) GEMM, 4, 32, 340961331, 100.00, 85240332, 85089916, 85345666
Perform a test using General Matrix Multiplication (GEMM) on GPU device Intel® Level-Zero:
Important
Consider the known limitations of Intel® Atom™ x3000 Series [Apollo Lake] Gen9 Graphics and Intel® Atom™ x6000 Series [Elkhart Lake] Gen11 Graphics that do NOT support Intel oneAPI Level-Zero (Only OpenCL API is supported). For more information see Limitation #6 - Fail when calling Intel Level0 API:
$ ze_info
When Intel® Level0 API is NOT supported by Intel® HD Graphics, the following error message will be received:
ze_info: ./samples/ze_info/main.cc:382: int main(int, char**): Assertion `status == ZE_RESULT_SUCCESS' failed.
$ /opt/intel/pti-gpu/samples/ze_gemm
Level Zero Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times) Target device: Intel(R) UHD Graphics 620 Matrix multiplication time: 0.0858843 sec Results are CORRECT with accuracy: 4.90573e-06 Matrix multiplication time: 0.0858047 sec Results are CORRECT with accuracy: 4.90573e-06 Matrix multiplication time: 0.0859173 sec Results are CORRECT with accuracy: 4.90573e-06 Matrix multiplication time: 0.0859335 sec Results are CORRECT with accuracy: 4.90573e-06 Total execution time: 0.569975 sec
or profile a General Matrix Multiplication (GEMM) on GPU device Intel® Level-Zero kernels using tools such as:
$ ze_hot_kernels /opt/intel/pti-gpu/samples/ze_gemm
Level Zero Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times) Target device: Intel(R) UHD Graphics 620 Matrix multiplication time: 0.085909 sec Results are CORRECT with accuracy: 4.90573e-06 Matrix multiplication time: 0.085957 sec Results are CORRECT with accuracy: 4.90573e-06 Matrix multiplication time: 0.0859215 sec Results are CORRECT with accuracy: 4.90573e-06 Matrix multiplication time: 0.0859638 sec Results are CORRECT with accuracy: 4.90573e-06 Total execution time: 0.551349 sec === Device Timing Results: === Total Execution Time (ns): 562776210 Total Device Time (ns): 343751333 Kernel, Calls, SIMD, Time (ns), Time (%), Average (ns), Min (ns), Max (ns) GEMM, 4, 32, 343751333, 100.00, 85937833, 85909000, 85963833
Enable Intel® oneAPI DPC++ and OpenMP Compute Runtimes¶
This tutorial explains the procedure to enable Intel® oneAPI Data Parallel C++ (DPC++) and OpenMP runtimes with Intel® ECI.
For full details please refer to oneAPI DPC++ Compiler and Runtime architecture design.
Setup the Intel® Graphics Level Zero and OpenCL™ GPU Compute Runtime.
Setup the Intel® oneAPI repository to install Intel® oneAPI runtime libraries.
Install Intel® GPU offloading Intel® oneAPI sample applications part of Intel® Profiling Tools Interfaces for GPU (PTI for GPU):
$ sudo apt install intel-pti-gpu-samples-oneapi
Perform a test using General Matrix Multiplication (GEMM) on GPU device using Intel® oneAPI DPC++ compile runtime:
$ /opt/intel/pti-gpu/samples/dpc_gemm
DPC++ Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times) Target device: Intel(R) UHD Graphics 620 Matrix multiplication time: 0.0852757 sec Results are CORRECT with accuracy: 4.88761e-06 Matrix multiplication time: 0.0846759 sec Results are CORRECT with accuracy: 4.88761e-06 Matrix multiplication time: 0.0848136 sec Results are CORRECT with accuracy: 4.88761e-06 Matrix multiplication time: 0.084784 sec Results are CORRECT with accuracy: 4.88761e-06 Total execution time: 0.558759 sec
or profile a General Matrix Multiplication (GEMM) on GPU device Intel® Level-Zero kernels using tools such as:
$ ze_hot_kernels /opt/intel/pti-gpu/samples/dpc_gemm
DPC++ Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times) Target device: Intel(R) UHD Graphics 620 Matrix multiplication time: 0.0851681 sec Results are CORRECT with accuracy: 4.88761e-06 Matrix multiplication time: 0.0848311 sec Results are CORRECT with accuracy: 4.88761e-06 Matrix multiplication time: 0.0854751 sec Results are CORRECT with accuracy: 4.88761e-06 Matrix multiplication time: 0.0846437 sec Results are CORRECT with accuracy: 4.88761e-06 Total execution time: 0.559788 sec === Device Timing Results: === Total Execution Time (ns): 616853516 Total Device Time (ns): 341483998 Kernel, Calls, SIMD, Time (ns), Time (%), Average (ns), Min (ns), Max (ns) _ZTSZZL11RunAndCheckN2cl4sycl5queueERKSt6vectorIfSaIfEES6_RS4_jfENKUlRNS0_7handlerEE_clES9_E6__GEMM, 4, 32, 341483998, 100.00, 85370999, 84983666, 85818333
Perform a test using General Matrix Multiplication (GEMM) on GPU device using Intel® oneAPI OpenMP library runtime:
$ /opt/intel/pti-gpu/samples/omp_gemm
OpenMP Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times) Target device: GPU Matrix multiplication time: 1.12189 sec Results are CORRECT with accuracy: 4.88761e-06 Matrix multiplication time: 0.0874556 sec Results are CORRECT with accuracy: 4.88761e-06 Matrix multiplication time: 0.0874651 sec Results are CORRECT with accuracy: 4.88761e-06 Matrix multiplication time: 0.0875799 sec Results are CORRECT with accuracy: 4.88761e-06 Total execution time: 1.38484 sec
or profile a General Matrix Multiplication (GEMM) on GPU device Intel® OpenMP using tools such as:
$ OMP_TOOL_LIBRARIES=/usr/lib/x86_64-linux-gnu/libomp_hot_regions.so /opt/intel/pti-gpu/samples/omp_gemm
[INFO] OMP Runtime Version: Intel(R) OMP version: 5.0.20220623 OpenMP Matrix Multiplication (matrix size: 1024 x 1024, repeats 4 times) Target device: GPU Matrix multiplication time: 1.12661 sec Results are CORRECT with accuracy: 4.88761e-06 Matrix multiplication time: 0.0874766 sec Results are CORRECT with accuracy: 4.88761e-06 Matrix multiplication time: 0.0876354 sec Results are CORRECT with accuracy: 4.88761e-06 Matrix multiplication time: 0.0875138 sec Results are CORRECT with accuracy: 4.88761e-06 Total execution time: 1.38942 sec === OpenMP Timing Results: === Total Execution Time (ns): 1523530801 Total Region Time (ns): 365737023 Region ID, Region Type, Calls, Transferred (bytes), Time (ns), Time (%), Average (ns), Min (ns), Max (ns) 139780423450449, Target, 4, 0, 356552629, 97.49, 89138157, 86490717, 96874554 139780423450450, TransferToDevice, 8, 33554432, 6250197, 1.71, 781274, 555161, 1221406 139780423450451, TransferFromDevice, 4, 16777216, 2934197, 0.80, 733549, 726679, 748623