TensorFlow-DirectML AI Benchmark with Intel Integrated Graphics

TensorFlow-DirectML allows us to run the AI Benchmark python benchmark on Intel Rocket Lake UHD Graphics 750 integrated graphics.

Unless you’ve been living under a rock for the past couple of years, AI and deep learning are a pretty big deal these days. So, I wanted to include at least one benchmark test to see if overclocking the IGP could improve deep learning performance.

I came across AI Benchmark: https://pypi.org/project/ai-benchmark/.

ai benchmark python

AI Benchmark Alpha is an open-source python library for evaluating AI performance of various hardware platforms, including CPUs, GPUs and TPUs. The benchmark is relying on the TensorFlow machine learning library, and is providing a lightweight solution for assessing inference and training speed for key Deep Learning models.

Installing AI Benchmark turned out to be a bit more of a hassle than initially expected. First, it relies on the TensorFlow machine learning library. TensorFlow can be used with AVX-enabled CPUs or CUDA-enabled GPUs, neither of which describes our integrated graphics.

Then I came across TensorFlow-DirectML. TensorFlow-DirectML broadens the reach of TensorFlow beyond its traditional Graphics Processing Unit (GPU) support, by enabling high-performance training and inferencing of machine learning models on any Windows devices with a DirectX 12-capable GPU through DirectML, a hardware accelerated deep learning API on Windows.

tensorflow-directml on github

While I am pretty software-illiterate, I was able to use the TensorFlow-DirectML package to get the benchmark to run on the integrated graphics. For those who want to also run it, I’ll quickly run you through the process.

  1. First, install Anaconda
  2. Then, run the Anaconda Prompt
  3. Create a new Python environment for the benchmark. Make sure to specify Python version 3.7 as only versions 3.5, 3.6, and 3.7 are supported by tensorflow-directml
    • conda create -n aibench python=3.7
  4. Activate your newly created environment
    • conda activate aibench
  5. Download and install the ai_benchmark package
    • pip install ai_benchmark
  6. Download and install the tensorflow-directml package
    • pip install tensorflow-directml
  7. Start python
    • python
  8. Import the AI Benchmark package
    1. From ai_benchmark import AIBenchmark
  9. Specify AI Benchmark to run on the IGP
    • Benchmark = AIBenchmark(use_CPU=None, verbose_level=3)
      • Use_CPU=None will prevent the benchmark from running on the CPU
      • Verbose_level=3 will provide us detailed information during the benchmark
  10. Start the benchmark
    1. Benchmark.run()

The benchmark itself takes about 20 minutes to run and outputs three scores: inference score, training score, and AI-Score. It’s the latter we use as performance measurement.

Do note that the AI Benchmark requires a substantial amount of memory. The integrated graphics does not have dedicated memory but rather shares the memory with the CPU. As you’ll see later in the video: when using 2 sticks of 8GB we ended up running out of memory and unable to complete the benchmark. So, I recommend a minimum of 32GB of system memory.

Also, you may run into an error called DXGI_ERROR_DEVICE_REMOVED while running the benchmark. This happens when there’s a timeout. Basically, the benchmark figures our integrated graphics gave up and went home. But our IGP didn’t give up … it’s just a bit slow.

To solve the issue, you can increase the timeout with the registry entry “TdrDelay”. This registry entry will extend the time a software application waits for the IGP. I use a value of 20 and this resolved my problem.

KeyPath: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers

KeyValue: TdrDelay

ValueType: REG_DWORD

ValueData: Number of seconds to delay. The default value is 2 seconds.

Oh, and did I mention already the IGP is pretty slow? The IGP gets a score of about 1,000 points at stock. When running the AI Benchmark on our 8 Rocket Lake CPU cores using the regular TensorFlow library, we get a score of about 3,000 points. So 3x higher.

TensorFlow-DirectML in SkatterBencher Guides

We use TensorFlow-DirectML in the following SkatterBencher guides:

  • SkatterBencher #28: Intel UHD Graphics 750 Overclocked to 1750 MHz (link)