Arrow Lake NPU Overclocking
We have a closer look at tuning the performance of the the Arrow Lake NPU (Neural Processing Unit), located on the SoC Tile.
Arrow Lake is Intel’s revolutionary new processor for mainstream desktop, featuring new P-cores and E-cores, disaggregated tile-based 3D Foveros packaging, an integrated NPU for AI acceleration, a next-generation uncore, DLVR power rails, and so much more.
In this blog post series, I have a closer look at Arrow Lake and explore its performance tuning and overclocking opportunities. I will cover the Compute (P-core, E-core, Graphics, NPU), Memory Subsystem (DDR, MC), and Data Fabric (Ring, NGU, D2D).
Table of Contents
Arrow Lake NPU: Introduction
The Neural Processing Unit (NPU) is located on the SoC tile which it shares with a lot of other IP blocks including but not limited to the media engine, the memory controller, and the system agent. The SoC tile is manufactured using the TSMC N6 process.
The Ai Boost branded integrated neural processor is based on the NPU 3 design which is also featured in Meteor Lake. NPU 3 features 2 Neural Compute Engines (NCE) with each two Shave DSP processor. A single NCE is capable of delivering 4 INT8 TOPS at 1 GHz, however, the NPU can boost up to 1600 MHz on Arrow Lake.
Arrow Lake NPU: Clocking
The clocking of the NPU is similar to other parts on the CPU: a reference clock is multiplied with a ratio to achieve the eventual operating frequency.
Reference Clock
The 100MHz reference clock is derived internally from the SoC PLL. However, it can also be clocked with an external clock generator providing the reference clock for the SoC Tile. This clock affects nearly all the IP blocks of Arrow Lake, except for those in the Compute Tile and the PCIe/DMI links. This PLL can be linked to the CPU PLL when you run in synchronous mode or work independently if you run asynchronous mode.
You can configure the SOC BCLK frequency between 40 and 1000 MHz.
In the ASUS ROG BIOS, you can configure the SOC BCLK Frequency in the Ai Tweaker menu by first setting the Ai Overclock Tuner to anything else than Auto.
NPU Ratio
The reference clock is multiplied by the NPU ratio to achieve the final clock frequency. Unfortunately, we cannot adjust the NPU Ratio on Arrow Lake, however this may change on future platforms. The default NPU frequency is 1600 MHz.
Arrow Lake NPU: Voltage
The voltage regulation for the neural processor is similar to that of other secondary devices on the Arrow Lake processor, such as the memory controller and the network-on-chip.
VccSA MBVR
The external VccSA MBVR powers several parts of the SOC dielet, including the NPU. Unlike Compute IP, the parts of the SOC dielet are not powered using DLVR. So, power delivery is identical to previous architectures. The most relevant parts powered by the VccSA voltage rail are the neural processor, the next-generation uncore, and the memory controller.
The voltage configuration of the VccSA voltage rail is rather complicated. Since multiple IP domains share the voltage rail, the VccSA voltage is set based on the highest requested voltage from the various connected IP blocks.
There’s no NPU-specific voltage available in the BIOS. However, in the ASUS ROG BIOS, you can set the VccSA voltage rail in the Ai Tweaker menu by configuring the CPU System Agent Voltage.
Arrow Lake NPU: Power
NPU power management is similar to that of other Uncore devices. There are multiple so-called “work points” which are defined by a certain frequency. Depending on the workload , the CPU will switch between the work points to adjust the NPU frequency.
By default, the NPU frequency idles at 733 MHz and boosts up to 1.6 GHz. There’s also a work point at 333 MHz which gets activated when the NGU Ratio is set to anything except 26X. That should get fixed in later BIOSes, however.
Arrow Lake NPU: Overclock
Due to the lack of NPU ratios, the only way to improve the NPU performance is by overclocking the SoC base clock frequency. When the BCLK is configured to asynchronous mode, the only frequencies affected by the SoC BCLK are those from the Graphics and SoC tile. The IP on the Compute tile, including the P-cores, E-cores, and Ring, are not affected.
The NPU has a surprisingly large overclocking headroom. I could easily increase the SoC BCLK to 120 MHz. That increases the NPU frequency from 1600 to 1920 MHz and it also increases the performance in the Procyon benchmark by about 20%. Increasing the memory frequency can also help with NPU performance, however the scaling is limited. When increasing the memory frequency from DDR5-4800 to DDR5-7200, the Procyon performance improved by about 3%
With a bit of extra tuning, we can even get the NPU up to 2 GHz!
Dmitry
How to check stability of the NPU? What’s software?
Pieter
I only used UL’s Procyon (https://benchmarks.ul.com/procyon) to evaluate performance and stability of the NPU