SkatterBencher #59: Intel Xeon w7-2495X Overclocked to 5200 MHz

skatterbencher 59 xeon w7-2495x

We overclock the Intel Xeon w7-2495X up to 5200 MHz with the ASUS Pro WS W790-Ace motherboard and EK-Pro water cooling.

This is my first look at the new Sapphire Rapids architecture, and I didn’t yet get to take a deep dive into the CPU. So I would consider this more of a layman’s approach to Sapphire Rapids overclocking and not an extensive overclocking guide.

All right, we have lots to cover, so let’s jump straight in.

Intel Sapphire Rapids: Introduction

The Intel Xeon w7-2495X processor is part of Intel’s 4th generation Xeon Scalable processor line-up, formerly known as Sapphire Rapids-112L and Sapphire Rapids-64L.

Sapphire Rapids is the successor to, well, a variety of architectures. On the 4S/8S server side, it’s the successor to the 2020 14nm Cooper Lake. On the 1S/2S server and workstation side, it’s the successor to the 2021 10nm Ice Lake. And on the high-end desktop (HEDT) side, it’s the successor to the 2019 14nm Cascade Lake.

sapphire rapids predecessor successor

Enthusiasts like myself can think of the Sapphire Rapids W790 platform as the successor of the overclockable Cascade Lake-X and locked Cascade Lake-W processors. Perhaps the real spiritual predecessor of the unlocked Xeon W-2400 and W-3400 series is the overclockable 28-core Xeon W-3175X, launched in 2018.

Intel spoke at length about Sapphire Rapids during the 2021 Architecture Day. I won’t go over the architecture details, but it suffices to say there are some significant improvements over the Ice Lake, Cooper Lake, and Cascade Lake architectures.

The most significant improvements are the Intel 7 process technology and up to 56 Golden Cove P-cores. That makes Alder Lake the equivalent on mainstream desktop. It also features PCIe 5.0, DDR5 EEC RDIMM support, and Intel’s 3rd generation Deep Learning Boost technology. Lastly, Sapphire Rapids transitions from a single monolithic die design to a multi-tile design for increased scalability … sort of. The multi-tile die design is used for the Xeon W-3400 series. However, the Xeon W-2400 segment still features a monolithic die

And that’s not where the difference between the W-2400 and W-3400 segment ends.

While the W-3400 series go up to 56 P-cores, the W-2400 only goes up to 24 P-cores. The W-3400 series supports 8-channel memory, whereas the W-2400 series only supports 4-channel memory. The W-3400 series also supports 112 PCI-e 5.0 lanes, whereas the W-2400 series only support 64 lanes.

The Sapphire Rapids Xeon W processors are further segmented according to the Xeon w3, w5, w7, and w9 brands. That’s similar to how we have Core i3 to Core i9 on the mainstream desktop. Xeon w9 is reserved exclusively for the W-3400 series, and you can only find Xeon w3 processors in the Xeon W-2400 product line. Xeon w5 and w7 are available in both series.

sapphire rapids mainstream and export

Across all Sapphire Rapids Workstation products, eight overclockable SKUs are split evenly between the W-2400 and W-3400 segments. We’ll get back to how overclocking is enabled later in this article. It suffices to say that the Xeon w7-2495X we’re overclocking in this guide is the top SKU in the W-2400 line-up.

sapphire rapids xeon 2 processor lineup

The Xeon w7-2495X has 24 P-cores with 48 threads. The base frequency is 3.1 GHz, the Turbo Boost 2.0 boost frequency is 4.6 GHz, and the Turbo Boost Max 3.0 boost frequency is 4.8 GHz. The maximum boost frequency gradually decreases from 4.8 GHz for up to 2 active cores to 3.3 GHz when all cores are active. The base TDP is 225W, and the Turbo TDP is 270W. The TjMax is 94 degrees Celsius.

sapphire rapids xeon w-2400 processor sku table

In this guide, we will cover four overclocking strategies:

  • First, we rely on ASUS MCE and Intel XMP performance boost technologies
  • Second, we use the ASUS water-cooled OC preset
  • Third, we try a static manual overclock
  • Lastly, we go for a dynamic manual overclock
xeon w7-2495x overclocking strategies

However, before we jump into overclocking, let us quickly review the hardware and benchmarks used in this guide.

Intel Xeon w7-2495X: Platform Overview

The system we’re overclocking today consists of the following hardware.

ItemSKUPrice (USD)
CPUIntel Xeon w7-2495X2,189
MotherboardASUS Pro WS W790-ACE900
CPU CoolingEK-Pro CPU WB 4677 Ni + Acetal Prototype
EK-Quantum Kinetic FLT 240 D5
EK-Quantum Surface P480M – Black
134
221
150
Fan ControllerElmorLabs EFC-SB
ElmorLabs EVC2SE
20
35
MemoryG.SKILL Zeta R5 DDR5-6800C34 64GB 
Power SupplyCooler Master V1200 Platinum270
Graphics CardASUS ROG Strix RTX 2080 TI880
StorageKingston 120GB SSDNow V30030
ChassisOpen Benchtable V2200

ASUS Pro WS W790-Ace

The W790-ACE is one of the two available ASUS W790 motherboards, the other being the W790E-Sage.

A primary difference between the two boards is the memory architecture slots, as the ACE model supports quad-channel memory, whereas the Sage model supports up to 8-channel memory. This aligns with the W-2400 and W-3400 CPU segmentation, where the former supports quad-channel and the latter supports 8-channel memory. Note that the Ace still supports W-3400 CPUs. It’s just that they’ll run with quad-channel memory.

Another difference relevant to performance tuning is that the Ace has 12+1+1 power phases, whereas the Sage has 14+1+1.

ElmorLabs EFC-SB & EVC2

The Easy Fan Controller SkatterBencher Edition (EFC-SB) is a customized EFC resulting from a collaboration between SkatterBencher and ElmorLabs.

I explained how I use the EFC-SB in a separate article on this website. By connecting the EFC-SB to the EVC2 device, I monitor the ambient temperature (EFC), water temperature (EFC), and fan duty cycle (EFC). I include the measurements in my Prime95 stability test results.

I also use the ElmorLabs EFC-SB to map the radiator fan curve to the water temperature. Without going into too many details: I have attached an external temperature sensor from the water in the loop to the EFC-SB. Then, I use the low/high setting to map the fan curve from 25 to 40 degrees water temperature. I use this configuration for all overclocking strategies.

The main takeaway from this configuration is that it gives us a good indicator of whether the cooling solution is saturated.

efc-sb information

Intel Xeon w7-2495X: Benchmark Software

We use Windows 11 and the following benchmark applications to measure performance and ensure system stability.

BENCHMARKLINK
SuperPI 4Mhttps://www.techpowerup.com/download/super-pi/
Geekbench 6https://www.geekbench.com/
Cinebench R23https://www.maxon.net/en/cinebench/
CPU-Zhttps://www.cpuid.com/softwares/cpu-z.html
V-Ray 5https://www.chaosgroup.com/vray/benchmark
AI-Benchmarkhttps://ai-benchmark.com/
Y-Cruncherhttp://www.numberworld.org/y-cruncher/
Blender https://opendata.blender.org/
3DMark CPU Profilehttps://www.3dmark.com/
3DMark Night Raidhttps://www.3dmark.com/
Nero Scorehttps://store.steampowered.com/app/1942030/Nero_Score__PC_benchmark__performance_test/
Handbrakehttps://handbrake.fr/
CS:GO FPS Benchhttps://steamcommunity.com/sharedfiles/filedetails/?id=500334237
Shadow of the Tomb Raiderhttps://store.steampowered.com/app/750920/Shadow_of_the_Tomb_Raider_Definitive_Edition/
Final Fantasy XVhttp://benchmark.finalfantasyxv.com/na/
Prime 95https://www.mersenne.org/download/

Xeon w7-2495X: Stock Performance

Before starting overclocking, we must check the system performance at default settings. Note that on this motherboard, Turbo Boost 2.0 is unleashed by default. So, to check the performance at default settings, you must enter the BIOS and

  • Go to the Ai Tweaker menu
  • Set ASUS MultiCore Enhancement to Disabled – Enforce All limits

Then save and exit the BIOS.

The default Turbo Boost 2.0 parameters for the Xeon w7-2495X are as follows:

  • PL1: 225W
  • PL2: 270W
  • Tau: 67sec
  • ICCIN_MAX: 364A
  • ICIN_VR_TDC: 135A
  • PMAX: 638W
  • VTRIP: 1.6730V
xeon w7-2495x turbo boost 2.0 parameters

Here is the benchmark performance at stock:

  • SuperPI 4M: 33.987 seconds
  • Geekbench 6 (single): 2,394 points
  • Geekbench 6 (multi): 16,891 points
  • Cinebench R23 Single: 1,662 points
  • Cinebench R23 Multi: 35,362 points
  • CPU-Z V17.01.64 Single: 734.3 points
  • CPU-Z V17.01.64 Multi: 16,418.3 points
  • V-Ray 5: 28,017 vsamples
  • AI Benchmark: 7,593 points
  • Y-Cruncher PI MT 10B: 287.458 seconds
  • Blender Monster: 262.15 fps
  • Blender Classroom: 124.81 fps
  • 3DMark Night Raid: 52,900 points
  • Nero Score: 2,808 points
  • Handbrake: 37.70 fps
  • CS:GO FPS Bench: 549.82 fps
  • Tom Raider: 240 fps
  • Final Fantasy XV: 188.70 fps
xeon w7-2495x stock benchmark performance

Here are the 3DMark CPU Profile scores at stock

  • CPU Profile 1 Thread: 978
  • CPU Profile 2 Threads: 1,733
  • CPU Profile 4 Threads: 2,748
  • CPU Profile 8 Threads: 4,496
  • CPU Profile 16 Threads: 7,874
  • CPU Profile Max Threads: 12,567
xeon w7-2495x stock 3dmark cpu profile performance

When running Prime 95 Small FFTs with AVX2 enabled, the average CPU effective clock is 2519 MHz with 0.788 volts. The average CPU temperature is 36.0 degrees Celsius. The ambient and water temperature is 24.0 and 26.7 degrees Celsius. The average CPU package power is 224.9 watts.

xeon w7-2495x stock prime95 small ffts avx2

When running Prime 95 Small FFTs with AVX disabled, the average CPU effective clock is 2823 MHz with 0.824 volts. The average CPU temperature is 36.0 degrees Celsius. The ambient and water temperature is 23.3 and 26.3 degrees Celsius. The average CPU package power is 224.9 watts.

xeon w7-2495x stock prime95 small ffts non-avx

Now, let us try our first overclocking strategy.

However, before we get going, make sure to locate the CMOS Clear button

Pressing the Clear CMOS button will reset all your BIOS settings to default which is helpful if you want to start your BIOS configuration from scratch. However, it does not delete any of the BIOS profiles previously saved. The Clear CMOS button is located on the rear I/O panel.

w790-ace cmos clear

OC Strategy #1: MCE + XMP 3.0

In our first overclocking strategy, we use ASUS MultiCore Enhancement to unleash the Turbo Boost 2.0 power limits and Intel XMP 3.0.

Intel Turbo Boost 2.0

Intel Turbo Boost 2.0 Technology allows the processor cores to run faster than the base operating frequency. Turbo Boost is available when the processor works below its rated power, temperature, and current specification limits. The ultimate advantage is opportunistic performance improvements in both multi-threaded and single-threaded workloads.

The turbo boost algorithm works according to a proprietary EWMA formula. This stands for Exponentially Weighted Moving Average.

turbo boost 2.0 ewma

There are 3 parameters to consider: PL1, PL2, and Tau.

  • Power Limit 1, or PL1, is the threshold the average power won’t exceed. Historically, this has always been set equal to Intel’s advertised TDP. PL1 should not be set higher than the thermal solution cooling limits.
  • Power Limit 2, or PL2, is the maximum power the processor can use for a limited time.
  • Tau, in seconds, is the time window for calculating the average power consumption. The CPU will reduce the CPU frequency if the average power consumed exceeds PL1.
turbo boost 2.0 parameters

Turbo Boost 2.0 technology is available on Sapphire Rapids as it’s the primary driver of performance over the base frequency.

An easy ASUS MultiCore Enhancement option on ASUS motherboards allows you to unleash the Turbo Boost power limits. Set the option to Enabled – Remove All Limits and enjoy maximum performance.

w790-ace asus multicore enhancement

Adjusting the power limits is strictly not considered overclocking, as we don’t change the CPU’s thermal, electrical, or frequency parameters. Intel provides the Turbo Boost parameters as guidance to motherboard vendors and system integrators to ensure their designs enable the base performance of the CPU. Better motherboard designs, thermal solutions, and system configurations can facilitate peak performance for longer.

Intel Extreme Memory Profile 3.0

Intel Extreme Memory Profile, or XMP, is an Intel technology that lets you automatically overclock the system memory to improve system performance. It extends the standard JEDEC specification and allows a memory vendor to program different settings onto the memory stick.

Intel Extreme Memory Profile 3.0 is the new XMP standard for DDR5 memory. It is primarily based on the XMP 2.0 standard for DDR4 but has additional functionality. While initially intended for DDR5 DIMM, XMP 3.0 is also compatible with DDR5 RDIMM.

intel extreme memory profile 3.0

The difference between DIMM and RDIMM, or Registered Dual In-line Memory Module, is that the latter has a register between the DRAM modules and the system’s memory controller. This buffer reduces the electrical load on the memory controller and thus allows stability with more memory modules. Hence, Sapphire Rapids supports quad- or eight-channel memory, whereas we only get dual-channel on mainstream desktop.

There’s a lot more to the new XMP 3.0 standard, which is outside the scope of this overclocking guide. Check out my Alder Lake launch article for more details about XMP 3.0.

XMP 3.0 works surprisingly well on this platform and feels like it works on mainstream desktop. Unfortunately, while this memory kit has an XMP profile of up to DDR5-6800 C34, my CPU is unstable at that frequency in Y-Cruncher. So I had to lower the frequency to DDR5-6600.

BIOS Settings & Benchmark Results

Upon entering the BIOS

  • Go to the Ai Tweaker menu
  • Set Ai Overclock Tuner to XMP I
  • Set ASUS MultiCore Enhancement to Enabled – Remove All limits
  • Set DRAM Frequency to DDR5-6600MHz

Then save and exit the BIOS.

We re-ran the benchmarks and checked the performance increase compared to the default operation.

  • SuperPI 4M: +1.07%
  • Geekbench 6 (single): +0.96%
  • Geekbench 6 (multi): +5.42%
  • Cinebench R23 Single: +9.57%
  • Cinebench R23 Multi: +9,32%
  • CPU-Z V17.01.64 Single: +4.40%
  • CPU-Z V17.01.64 Multi: +0.55%
  • V-Ray 5: +7.00%
  • AI Benchmark: +5.20%
  • Y-Cruncher PI MT 10B: +13.97%
  • Blender Monster: +7.67%
  • Blender Classroom: +10.58%
  • 3DMark Night Raid: +0.27%
  • Nero Score: +5.52%
  • Handbrake: +0.69%
  • CS:GO FPS Bench: +0.84%
  • Tom Raider: +4.17%
  • Final Fantasy XV: +5.39%
xeon w7-2495x mce + xemp benchmark performance

Here are the 3DMark CPU Profile scores

  • CPU Profile 1 Thread: +0.51%
  • CPU Profile 2 Threads: +7.56%
  • CPU Profile 4 Threads: +4.77%
  • CPU Profile 8 Threads: +1.85%
  • CPU Profile 16 Threads: +0.18%
  • CPU Profile Max Threads: +3.27%
xeon w7-2495x mce + xmp 3dmark cpu profile performance

After unleashing the Turbo Ratio 2.0 power limits and enabling Extreme Memory Profile, the performance improves noticeably but not spectacularly. The most significant performance improvements are in memory-sensitive workloads, which benefit from overclocking the memory from DDR5-4800 to DDR5-6600. We see the highest performance improvement of +13.97% in Y-Cruncher.

We don’t see a more significant impact from unleashing the Turbo Boost power limits because we’re frequency limited. The standard frequency for an all-core workload is only 3.3 GHz, so the frequency won’t boost beyond that despite unleashing the power limit.

When running Prime 95 Small FFTs with AVX2 enabled, the average CPU effective clock is 2993 MHz with 0.845 volts. The average CPU temperature is 44.0 degrees Celsius. The ambient and water temperature is 24.6 and 27.7 degrees Celsius. The average CPU package power is 305.6 watts.

xeon w7-2495x mce + xmp prime95 small ffts avx2

When running Prime 95 Small FFTs with AVX disabled, the average CPU effective clock is 3093 MHz with 0.855 volts. The average CPU temperature is 41.0 degrees Celsius. The ambient and water temperature is 24.6 and 27.1 degrees Celsius. The average CPU package power is 265.3 watts.

xeon w7-2495x mce + xmp prime95 small ffts non-avx

OC Strategy #2: Water-Cooled OC Preset

In our second overclocking strategy, we use the Water-Cooled OC Preset available in the BIOS.

Water-Cooled OC Preset

The water-cooled OC preset is an excellent addition to the ASUS Pro WS W790 motherboards, giving Xeon customers an easy path to additional performance. The preset can be enabled with the click of a single button and drastically improves the all-core performance by changing the Turbo Boost 2.0 Ratio configuration.

asus w790 water-cooled oc preset

On this Xeon w7-2495X, for example, by enabling the preset, the all-core frequency increases by more than 1 GHz from 3.3 GHz to 4.4 GHz. Most importantly, it does that without increasing the CPU core voltage!

xeon w7-2495x asus w790 water cooled oc preset

“How is that possible?” I hear you ask.

The Turbo Boost 2.0 configuration allows any of the 24 cores to boost up to 4.6 GHz when up to 4 cores are active. This configuration also means that every core has a factory-fused voltage-frequency curve up to 4.6 GHz. In other words, Intel has defined the voltage needed to run at 4.6 GHz. By adjusting the Turbo Boost 2.0 ratio configuration, we can increase the all-core frequency from 3.3 GHz to 4.4 GHz without worrying about the appropriate voltage for each core. The only thing that now stands between running all cores at 4.6 GHz instead of 3.3 GHz is the Turbo Boost 2.0 power limits (which we unleashed with ASUS MultiCore Enhancement) and, well, our cooling solution.

xeon w7-2495x turbo boost max 3.0

BIOS Settings & Benchmark Results

Upon entering the BIOS

  • Go to the Ai Tweaker menu
  • Set Ai Overclock Tuner to XMP I
  • Set ASUS MultiCore Enhancement to Enabled – Remove All limits
  • Set CPU Core Ratio to Water-Cooled OC Preset
  • Set DRAM Frequency to DDR5-6600MHz

Then save and exit the BIOS.

We re-ran the benchmarks and checked the performance increase compared to the default operation.

  • SuperPI 4M: +0.40%
  • Geekbench 6 (single): +3.55%
  • Geekbench 6 (multi): +18.92%
  • Cinebench R23 Single: +10.77%
  • Cinebench R23 Multi: +41.72%
  • CPU-Z V17.01.64 Single: +4.74%
  • CPU-Z V17.01.64 Multi: +33.24%
  • V-Ray 5: +40.15%
  • AI Benchmark: +23.19%
  • Y-Cruncher PI MT 10B: +21.72%
  • Blender Monster: +40.29%
  • Blender Classroom: +45.43%
  • 3DMark Night Raid: +7.93%
  • Nero Score: +16.52%
  • Handbrake: +21.99%
  • CS:GO FPS Bench: +1.28%
  • Tom Raider: +7.50%
  • Final Fantasy XV: +5.95%
xeon w7-2495x asus water-cooled oc preset benchmark performance

Here are the 3DMark CPU Profile scores

  • CPU Profile 1 Thread: +5.52%
  • CPU Profile 2 Threads: +12.87%
  • CPU Profile 4 Threads: +17.54%
  • CPU Profile 8 Threads: +17.48%
  • CPU Profile 16 Threads: +18.40%
  • CPU Profile Max Threads: +27.97%
xeon w7-2495x asus water-cooled oc preset 3dmark cpu profile performance

By enabling the Water-Cooled OC Preset, we significantly increase the all-core frequency. That greatly improved performance as we also unleashed the Turbo Boost 2.0 power limits. We see a maximum performance improvement of +45.43% in Blender Classroom.

When running Prime 95 Small FFTs with AVX2 enabled, the average CPU effective clock is 4373 MHz with 1.123 volts. The average CPU temperature is 93.0 degrees Celsius. The ambient and water temperature is 25.6 and 31.9 degrees Celsius. The average CPU package power is 649.7 watts.

xeon w7-2495x asus water-cooled oc preset prime95 small ffts avx2

When running Prime 95 Small FFTs with AVX disabled, the average CPU effective clock is 4390 MHz with 1.104 volts. The average CPU temperature is 77.0 degrees Celsius. The ambient and water temperature is 24.4 and 29.8 degrees Celsius. The average CPU package power is 483.6 watts.

xeon w7-2495x asus water-cooled oc preset prime95 small ffts non-avx

OC Strategy #3: Simple Fixed Overclock

In our third overclocking strategy, we pursue a manual overclock. I am not known for advocating for a simple all-core overclock, especially for mainstream platforms. And that’s because you tend to lose out on lots of performance headroom in single-threaded or light workloads. However, I wanted to give this approach another shot for Sapphire Rapids.

Before I show you my BIOS settings, let’s first look at the Sapphire Rapids Clocking and Voltage Topology. Please note that there’s not much public information on the topology. So most of the information I provide is inferred from my testing and the ASUS team’s help.

Sapphire Rapids Clocking Topology

The clocking of a standard Sapphire Rapids platform slightly differs from what we’re used to with mainstream platforms. While technically, Sapphire Rapids should support a 25 MHz Crystal input to the PCH and then have the PCH generate the rest of the clocks, this is not the standard method of operation and is not officially supported.

The supported clocking topology relies on a 25 MHz crystal or crystal oscillator input to an external CK440Q clock generator which then connects to one or more DB2000Q differential buffer devices. The platform supports multiple clocking topologies: balanced and unbalanced.

  • Balanced: All CPU BCLKs and PCIe reference clocks are driven by the same DB or different DBs at the same depth levels
  • Unbalanced: CPU BCLKs are driven by DB and PCIe by the extCLK/PCHCLK or other DBVice versa

The specific implementation depends on your choice of motherboard. Ideally, we would isolate the CPU BCLK from any PCIe reference clocks. However, it seems that this unbalanced architecture is currently not working very well. So you’ll likely see all motherboards adopting a balanced clocking architecture. That means if you increase the CPU BCLK, you also increase the CPU PCIe clock frequency.

Either way, the external clock generator generates multiple 100 MHz or 25 MHz clock sources. These sources can be used for:

  • 100 MHz CPU base clock frequency
  • 100 MHz CPU PCIe clock frequency
  • 100 MHz PCH PCIe clock frequency
  • 100 MHz NIC clock frequency
  • 100 MHz clock input for the PCH

The 100 MHz CPU BCLK is then multiplied with specific ratios for each of the different parts in the CPU.

Each P-core can run at its independent frequency. The maximum CPU ratio is 120X. However, the maximum all-core ratio is limited to 52X on multi-tile die CPUs. I’ll get back to that in a minute.

The Mesh PLL ties together the last-level cache, cache box, and seemingly also the memory controller. It can run an independent frequency from the P-cores. On monolithic dies of the W-2400 processors, the Mesh ratio is limited to 80X. However, on the multi-tile dies of the W-3400 processors, the Mesh ratio is limited to 27X.

The memory frequency is also driven by the CPU BCLK and multiplied by a memory ratio. Unlike on mainstream desktop, the memory frequency is not tied to the memory controller frequency and can run independently. The memory ratio goes up to 88X or a frequency of up to DDR5-8800.

sapphire rapids clocking topology balanced

There are a couple of noteworthy oddities with CPU core clocking on Sapphire Rapids.

  1. While the per-core maximum ratio is 120X, the Turbo Boost 2.0 ratio limit for 1-active core is 117X.
  2. On multi-tile die Sapphire Rapids variants, the Turbo Boost 2.0 all-core maximum allowed CPU ratio is 52X. Effectively, you must increase the BCLK frequency with the 52X maximum ratio to break all-core world records. This all-core ratio limit is not present on the monolithic die variants.
  3. The CPU has similar FLL OC issues as on early Alder Lake platforms. In short, a bug appears to allow a ratio to be programmed to the CPU PLL even though the actual effective frequency is lower. This may cause you to see reported frequencies much higher than reasonable. However, the CPU performance in benchmark applications isn’t affected, so the benchmark performance reflects the real effective frequency.
  4. Building on the previous point, specific CPU cores appear to have different points at which the FLL bug occurs. So for record-chasing attempts, you may try to find the cores in your CPU with the highest FLL range and only use those for benchmarking.

For the extreme overclockers out there: Sapphire Rapids CPUs typically cold bug between negative 90 and 120 degrees Celsius, likely due to the FIVR.

Sapphire Rapids Voltage Topology

Sapphire Rapids uses a combination of fully integrated voltage regulators (FIVR) and motherboard voltage regulators (MBVR) for power management. There are eight (8) distinct voltage inputs to a Sapphire Rapids processor. Most of these power inputs power a FIVR or fully integrated voltage regulator. The FIVR then manages the voltage provided to specific parts of the CPU. Some of these voltages can be controlled by the end user.

Unfortunately, it’s unclear which FIVRs control what end-user configurable power domains. I did my best to assemble the information, but please bear in mind that the following overview may not be entirely accurate.

VccIN is the primary power source for the CPU. It provides the input power for the FIVR, which in turn provides power to each P-core individually and the combined Mesh and last-level cache. The default voltage is 1.8V. Through Intel’s overclocking toolkit, we have access to up to 57 power domains:

  • VccCOREn provides the voltage to up to 56 individual P-core.
  • VccMESH provides the voltage to the mesh and last-level cache

VccINFAON provides the input power for those parts of the CPU that should always be on. INF stands for “infrastructure,” and AON stands for “always-on.” The power domains include the FIVRs needed for initializing the CPU during boot-up. The default voltage is 1.0V.

VccFA_EHV provides the input power for the PCIe 5.0, UPI I/O, and all other FIVR power domains. The default voltage is 1.0V. Through Intel’s overclocking toolkit, we have access to two power domains:

  • VccCFN provides the power for the on-die Coherent Fabric (CF), which provides the means of communication between the various components inside the die or tile. Each module on the die, whether the core, memory controller, io, or accelerator, contains an agent providing access to the CF. The default voltage is 0.7V.
  • VccMDFI provides the power for the Multi-Die Fabric Interconnect, which extends the Coherent Fabric across multiple dies. The default voltage is 0.5V

VccFA_EHV_FIVRA provides the input power for the analog IO FIVR domains and the core power for the on-package HBM in Sapphire Rapids SKUs with HBM. The default voltage is 1.8V. Through Intel’s overclocking toolkit, we have access to two power domains:

  • VccIO provides the power for all IO modules on the die. The default voltage is 1.0V.
  • VccMDFIA provides the power for the analog parts of the Multi-Die Fabric Interconnect. The default voltage is 0.9V.

VccD_HV provides the power source for the DDR5 memory controllers. These voltages are not shared with the DDR5 memory. The default voltage is 1.1V. Through Intel’s overclocking toolkit, we have access to two power domains:

  • VccDDRD, possibly the memory controller core voltage, which defaults at 0.7V
  • VccDDRA, possibly the memory controller side I/O voltage, which defaults at 0.9V

VNN provides the power for the CPU GPIO and on package devices. The default voltage is 1.0V.

3V3_AUX provides power for some on-package devices such as the PIROM. The default voltage is 3.3V.

VPP_HBM provides the charge pump voltage for the on-package HBM on Sapphire Rapids CPUs with HBM. The default voltage is 2.5V.

sapphire rapids voltage topology

I want to make some additional notes regarding overclocking this Xeon w7-2495X.

  1. Since the 2495X sports the monolithic die, any multi-die fabric interconnect voltages are irrelevant.
  2. For 2495X overclocking, the only relevant voltages are those connected to the VccIN FIVR, including the P-core and Mesh voltages and, to a lesser extent, the VccCFN voltage for the coherent fabric and VccD voltage for the DDR5 memory controller.
  3. The VccIN is the only voltage rail requiring some tuning, as Sapphire Rapids can draw much power. The finetuning practice is to increase the VccIN voltage from 1.8V to 2.2-2.3V to help the VRM deal with high loads. After all, high power with low voltage requires a high current, which is particularly stressful for the VRM

Xeon w7-2495X Simple Fixed OC Manual Tuning

In this strategy, we’re pursuing a traditional CPU overclock, using one ratio and voltage for all CPU P-cores. The main limiting factor for this type of overclock is our worst-case scenario stability test: Prime95 Small FFTs with AVX2 enabled.

This may surprise some of you, as we’d expect the AVX-512 workload to be heavier. But as you can see in the table below, AVX2 produces a higher CPU package power with a higher CPU temperature.

xeon w7-2495x avx and power

The main limiting factor for the maximum frequency is not the core’s overclocking capabilities but the maximum voltage we can use in our worst-case scenario workload. The maximum allowed temperature for Sapphire Rapids CPUs is 94 degrees Celsius. With the water-cooled OC preset, we already reached that temperature in the Prime95 AVX2 test.

As you know, power scales exponentially with operating voltage. For example, a 10% increase in voltage on this CPU increases power consumption by about 21%. Ultimately, the operating voltage is the main limiting factor for our maximum frequency.

xeon w7-2495x voltage temperature power scaling

I found that for this CPU, the maximum voltage was around 1.15V. This was sufficient to set an all-core frequency of 4.8 GHz, equal to stock when up to 2 cores are active, and 1.5 GHz higher than stock when all 24 cores are active.

In addition to increasing the CPU core voltage, we slightly increase the VccIN voltage to 2.2V. That’s to make it easier on the VccIN VRM. After all, a higher voltage means a lower current at a given input power in Watts.

BIOS Settings & Benchmark Results

Upon entering the BIOS

  • Go to the Ai Tweaker menu
  • Set Ai Overclock Tuner to XMP I
  • Set ASUS MultiCore Enhancement to Enabled – Remove All limits
  • Set CPU Core Ratio to By Core Usage
  • Enter the By Core Usage sub-menu
    • Set Turbo Ratio Limit 1 to 48
    • Set Turbo Ratio Cores 1 to 24
  • Leave the By Core Usage sub-menu
  • Set DRAM Frequency to DDR5-6600MHz
  • Set VCore 1.8V In to Manual Mode
    • Set CPU Core Voltage Override to 2.2
  • Set Global Core SVID Voltage to Manual Mode
    • Set CPU Core Voltage Override to 1.15

Then save and exit the BIOS.

We re-ran the benchmarks and checked the performance increase compared to the default operation.

  • SuperPI 4M: +1.33%
  • Geekbench 6 (single): +4.59%
  • Geekbench 6 (multi): +19.47%
  • Cinebench R23 Single: +11.61%
  • Cinebench R23 Multi: +59.48%
  • CPU-Z V17.01.64 Single: +4.96%
  • CPU-Z V17.01.64 Multi: +45.77%
  • V-Ray 5: +51.59%
  • AI Benchmark: +18.96%
  • Y-Cruncher PI MT 10B: +22.65%
  • Blender Monster: +53.21%
  • Blender Classroom: 58.58%
  • 3DMark Night Raid: +13.29%
  • Nero Score: +20.37%
  • Handbrake: +28.67%
  • CS:GO FPS Bench: +3.25%
  • Tom Raider: +15.83%
  • Final Fantasy XV: +8.17%
xeon w7-2495x 4.8g fixed overclock benchmark performance

Here are the 3DMark CPU Profile scores

  • CPU Profile 1 Thread: +5.11%
  • CPU Profile 2 Threads: +8.25%
  • CPU Profile 4 Threads: +15.17%
  • CPU Profile 8 Threads: +22.82%
  • CPU Profile 16 Threads: +23.53%
  • CPU Profile Max Threads: +37.79%
xeon w7-2495x 4.8g fixed overclock 3dmark cpu profile performance

Running the Xeon w7-2495X at 4.8 GHz is the equivalent of how MCE used to work: set every core to the maximum default CPU ratio. In the default specification, only 2 cores can boost to 4.8 GHz. In our overclock, every core can boost to that frequency. Furthermore, we’ve increased the frequency by a whopping 1.5 GHz in all core workloads. So, naturally, we expect significant performance gains. We get a maximum performance improvement of +59.48% in Cinebench R23.

When running Prime 95 Small FFTs with AVX2 enabled, the average CPU effective clock is 4089 MHz with 1.154 volts. The average CPU temperature is 93.0 degrees Celsius. The ambient and water temperature is 26.4 and 32.5 degrees Celsius. The average CPU package power is 644.7 watts.

xeon w7-2495x 4.8g fixed overclock prime95 small ffts avx2

When running Prime 95 Small FFTs with AVX disabled, the average CPU effective clock is 4794 MHz with 1.154 volts. The average CPU temperature is 91.0 degrees Celsius. The ambient and water temperature is 25.4 and 31.1 degrees Celsius. The average CPU package power is 581.8 watts.

xeon w7-2495x 4.8g fixed overclock prime95 small ffts non-avx

OC Strategy #4: Simple Dynamic Overclock

In our final overclocking strategy, we pursue a modern, dynamic manual overclock. We must discuss Intel’s overclocking toolkit for Sapphire Rapids to explore how we can do this.

Sapphire Rapids Overclocking Toolkit

I described the history of Intel’s overclocking toolkit in a previous blog on this website titled. Long story short, Intel developed and maintained a technology called the OC Mailbox which contains the entire overclocker’s toolkit. This toolkit is not always the same for each CPU architecture, as sometimes we need different tools.

On Sapphire Rapids, the overclocking toolkit consists of the following tools:

  • Per Core ratio and voltage control
  • Mesh ratio and voltage control
  • DRAM ratio control
  • AVX2, AVX-512, and TMUL ratio offset
  • Turbo Boost 2.0 ratio and power control
  • Turbo Boost Max 3.0 ratio control
  • SVID disable
  • XMP 3.0 support
  • XTU Support
sapphire rapids unlocked processors

Notably missing from the OC toolbox are prominent features we know from mainstream desktop like Advanced Voltage Offset, better known as V/F points, and OverClocking Thermal Velocity Boost, or OCTVB.

Sapphire Rapids Turbo Boost 2.0 Ratio Configuration

We all know the Turbo Boost 2.0 technology from its impact on the power limits, but a second significant aspect of Turbo Boost 2.0 is configuring the CPU frequency based on the number of active cores.

Turbo Boost 2.0 Ratio Configuration allows us to configure the overclock for different scenarios ranging from 1 active core to all active cores. This enables us to run some cores significantly faster than others when the conditions are right. Intel provides eight (8) registers to configure the Turbo Boost 2.0 Ratio.

On mainstream platforms where the top SKU has no more than 8 P-cores, these registers are configured from 1-active P-core to 8-active P-cores. However, on platforms with core counts beyond 8 cores, we can configure each register by target Turbo Boost Ratio and the number of active cores.

For example, the standard and ASUS’ water-cooled OC preset Turbo Boost Ratio Configuration of the Xeon w7-2495X is as follows.

By Core Usage is not the same as configuring each core individually. When using By Core Usage, we determine an overclock according to the actual usage. For example, if a workload is using 4 cores, then the CPU determines by itself which cores should execute this workload and applies our set frequency to those cores.

Sapphire Rapids Per Core Ratio & Voltage Control

While we only recently saw the addition of per-core ratio control on mainstream desktop with Rocket Lake, on the high-end desktop, the ability to control the maximum ratio and voltage for each core has been around since Broadwell-E in 2016.

The Per Core Ratio and Voltage control options let you control the upper end of the voltage-frequency curve of each core inside your CPU. While the general rules of adaptive voltage mode still apply, this enables two crucial new avenues for CPU overclocking.

  • First, it allows users to individually overclock each core and find its maximum stable frequency.
  • Second, it allows users to set an aggressive by core usage overclock while constraining the worst cores.

Since each core has its own FIVR-regulated power rail, it’s possible to finetune each core to its maximum capability. We’ll cover how this tuning works when discussing Adaptive Voltage Mode.

Sapphire Rapids AVX2, AVX-512, and TMUL Ratio Offset

Intel first introduced the AVX negative ratio offset on Broadwell-E processors. Successive processors adopted this feature and eventually expanded it with AVX2 and AVX-512 negative offsets. New on Sapphire Rapids is the addition of the TMUL ratio offset.

TMUL stands for Tile matrix MULtiply and is an Intel Advanced Matrix Extensions (AMX) technology component. It’s designed to accelerate AI and deep learning workloads.

tmul amx

The ratio offsets help achieve maximum performance for SSE, AVX, and AMX workloads.

While in the past, the ratio offsets were triggered by detecting specific AVX instructions, since Skylake, Intel has implemented a more elegant frequency license-based approach. The four frequency levels are L0, L1, L2, and L3. Each level is associated with particular instructions ranging from lightest to heaviest. Each level can also be associated with one specific ratio offset.

L0 is associated with the lightest workload, and the frequency matches the maximum Per Core ratio limit. L1 is related to the AVX2 ratio offset, and L2 is associated with the AVX-512 ratio offset. The TMUL ratio offset adds another L3 frequency license on top of that.

As a rule, the ratio offset configured for a given frequency license must be equal to or higher than the preceding frequency license. In other words: L0 = 0 ≤ L1 ≤ L2 ≤ L3.

sapphire rapids avx2 avx-512 tmul ratio offset

Since Ice Lake, the ratio offset is applied on a per-core basis. The ratio offset is subtracted from the core-specific ratio limit but is still subject to the other Turbo Boost ratio configuration rules.

I didn’t have sufficient time yet to look too deeply into the specific ratio offset behavior for Sapphire Rapids, so I’ll have to get back to this topic in future guides.

Sapphire Rapids Adaptive Voltage Mode

Like any previous Intel architecture, there are two main ways of configuring the voltage for the CPU cores: override mode and adaptive mode.

  • Override mode specifies a single static voltage across all ratios. It is mainly used for extreme overclocking where stability at high frequencies is the only consideration.
  • Adaptive mode is the standard mode of operation. In Adaptive Mode, the CPU relies on the factory-fused voltage-frequency curves to set the appropriate voltage for a given ratio. When configuring an adaptive voltage, it is mapped against the “OC Ratio, ” the highest configured ratio. We’ll get back to that in a minute.

Since Sapphire Rapids uses FIVR, we can only adjust the core voltage by configuring the CPU PCU via BIOS or specialized tools like XTU.

We can specify a voltage offset for override and adaptive modes. Of course, this doesn’t make much sense for override mode – if you set 1.15V with a +50mV offset, you could just set 1.20V – but it can be helpful in adaptive mode as you can offset the entire V/F curve by up to 500mV in both directions.

sapphire rapids adaptive voltage mode

On Sapphire Rapids, you can configure the override or adaptive voltage on a Global or Per-Core level. Let’s focus on adaptive mode voltage configuration and first look at how it works for a single core.

When we set an adaptive voltage for a core, this voltage is mapped against the “OC Ratio.” The “OC Ratio” is the highest ratio configured for the CPU across all settings and cores. When you leave everything at default, the OC ratio is determined by the default maximum turbo ratio. In the case of the w7-2495X, that ratio would be 48X because of the Turbo Boost Max 3.0. The “OC Ratio” equals the highest configured ratio if you overclock.

sapphire rapids oc ratio adaptive voltage mode

Specific rules govern what adaptive voltage can be set.

A) the voltage set for a given ratio n must be higher than or equal to the voltage set for ratio n-1.

Suppose our 2495X runs 48X at 1.25V. In that case, setting the adaptive voltage, mapped to 48X, lower than 1.25V, is pointless. 48X always runs at 1.25V or higher. Usually, BIOSes may allow you to configure lower values. However, the CPU’s internal mechanisms will override your configuration if it doesn’t follow the rules.

B) the adaptive voltage configured for any ratio below the maximum default turbo ratio will be ignored.

Take the same example of the 2495X, specified to run 48X at 1.25V. If you try to configure all cores to 45X and set 1.10V, the CPU will ignore this because it has its own factory-fused target voltage for all ratios up to 48X and will use this voltage. You can only change the voltage of the OC Ratio, which, as mentioned before, on the 2495X, is 48X and up.

C) for ratios between the OC Ratio and the next highest factory-fused V/f point, the voltage is interpolated between the set adaptive voltage and the factory-fused voltage.

Returning to our example of our 2495X specified to run 48X at 1.25V, let’s say we manually configure the OC ratio to be 52X at 1.35V. The target voltage for ratios 51X, 50X, and 49X will now be interpolated between 1.25V and 1.35V.

As I mentioned already, we can do this for each core individually. However, that would be rather painful, especially on a 56-core CPU! Fortunately, there’s also an alternative way to set a global adaptive voltage.

When we set a global adaptive voltage, it maps this voltage to the OC Ratio for each core in our CPU. So, if our OC Ratio is 52X and the global adaptive voltage is 1.35V, then every core in our CPU has a voltage frequency curve that goes up to 52X at 1.35V. That certainly makes things easier.

One last note: we can also configure a Per-Core Ratio Limit. Counter-intuitively, this Ratio doesn’t act as a core-specific OC Ratio but as a means to limit what parts of the V/F curve can be used. Let’s that same example of the 52X at 1.35V. If we set the Per-Core Ratio Limit to 51X, the CPU core will boost up to 5.1 GHz at a voltage interpolated between 52X at 1.35V and 48X at 1.25V.

sapphire rapids per core ratio limit example

Xeon w7-2495X Simple Dynamic OC Manual Tuning

For this OC Strategy, I use a global Adaptive Voltage offset of 100mV. This lifts every core’s entire voltage-frequency curve by 100mV. I found this helped ensure better stability during transient loads going from single-threaded and light workloads to heavy avx all-core workloads.

With the additional voltage headroom, I increased some cores to 5.2 GHz while others had to stay around 5.0 GHz. That’s still about 400 MHz higher than stock. Unfortunately, I also had to configure an AVX2/AVX-512/AMX ratio offset of 4X to ensure stability in those workloads.

Furthermore, I was able to increase the all-core frequency to 4.9 GHz. That means we’re now 1.6 GHz higher than the stock all-core frequency.

sapphire rapids manual dynamic overclock

BIOS Settings & Benchmark Results

Upon entering the BIOS

  • Go to the Ai Tweaker menu
  • Set Ai Overclock Tuner to XMP I
  • Set ASUS MultiCore Enhancement to Enabled – Remove All limits
  • Set CPU Core Ratio to By Core Usage
  • Enter the By Core Usage sub-menu
    • Set Turbo Ratio Limit 1 to 52
    • Set Turbo Ratio Cores 1 to 8
    • Set Turbo Ratio Limit 2 to 51
    • Set Turbo Ratio Cores 2 to 12
    • Set Turbo Ratio Limit 3 to 50
    • Set Turbo Ratio Cores 3 to 16
    • Set Turbo Ratio Limit 4 to 49
    • Set Turbo Ratio Cores 4 to 24
  • Leave the By Core Usage sub-menu
  • Enter the Specific Core sub-menu
    • Set Core 0, 1, 6, 18, and 21 Specific Ratio Limit to 50
    • Set Core 2, 3, 4, 5, 9, 10, 12, 13, 14, 15, 16, 17, and 19 Specific Ratio Limit to 51
    • Set Core 7, 8, 11, 20, 22, and 23 Specific Ratio Limit to 52
  • Leave the Specific Core sub-menu
  • Set DRAM Frequency to DDR5-6600MHz
  • Enter the AVX Related Controls sub-menu
    • Set AVX2, AVX512, and TMUL Ratio Offset to per-core Ratio Limit to User Specify
    • Set AVX2, AVX512, and TMUL Ratio Offset to 4
  • Leave the AVX Related Controls sub-menu
  • Enter the DIGI+ VRM sub-menu
    • Set CPU Current Capability to 140%
  • Leave the DIGI+ VRM sub-menu
  • Set VCore 1.8V In to Manual Mode
    • Set CPU Core Voltage Override to 2.3
  • Set Global Core SVID Voltage to Adaptive Mode
    • Set Offset Mode Sign to +
    • Set Offset Voltage to 0.1

Then save and exit the BIOS.

We re-ran the benchmarks and checked the performance increase compared to the default operation.

  • SuperPI 4M: +9.00%
  • Geekbench 6 (single): +7.77%
  • Geekbench 6 (multi): +24.56%
  • Cinebench R23 Single: +14.20%
  • Cinebench R23 Multi: +44.61%
  • CPU-Z V17.01.64 Single: +15.29%
  • CPU-Z V17.01.64 Multi: +48.94%
  • V-Ray 5: +52.21%
  • AI Benchmark: +25.01%
  • Y-Cruncher PI MT 10B: +21.57%
  • Blender Monster: +52.33%
  • Blender Classroom: 53.58%
  • 3DMark Night Raid: +13.97%
  • Nero Score: +15.88%
  • Handbrake: +23.18%
  • CS:GO FPS Bench: +7.11%
  • Tom Raider: +17.08%
  • Final Fantasy XV: +7.68%
sapphire rapids manual dynamic overclock benchmark performance

Here are the 3DMark CPU Profile scores

  • CPU Profile 1 Thread: +11.66%
  • CPU Profile 2 Threads: +16.91%
  • CPU Profile 4 Threads: +21.43%
  • CPU Profile 8 Threads: +24.00%
  • CPU Profile 16 Threads: +24.65%
  • CPU Profile Max Threads: +40.57%
sapphire rapids manual dynamic overclock 3dmark cpu profile performance

Usually, changing to a dynamic overclock means making fewer performance compromises as we can tune for light-load and high-load scenarios. On Sapphire Rapids, that’s not the case because of the limited tuning toolkit. So, while our dynamic overclock helps boost the performance in light and few-threaded workloads, we must compromise the performance in heavy or all-core workloads. Of course, the performance is still well beyond stock, but in some benchmarks, it’s not as high as with a fixed overclock. We get the highest performance improvement of +53.58% in Blender Classroom.

When running Prime 95 Small FFTs with AVX2 enabled, the average CPU effective clock is 4092 MHz with 1.162 volts. The average CPU temperature is 93.0 degrees Celsius. The ambient and water temperature is 27.5 and 33.4 degrees Celsius. The average CPU package power is 633.8 watts.

sapphire rapids manual dynamic overclock prime95 small ffts avx2

When running Prime 95 Small FFTs with AVX disabled, the average CPU effective clock is 4469 MHz with 1.221 volts. The average CPU temperature is 93.0 degrees Celsius. The ambient and water temperature is 26.4 and 32.4 degrees Celsius. The average CPU package power is 598.4 watts.

sapphire rapids manual dynamic overclock prime95 small ffts non-avx

Intel Xeon w7-2495X: Conclusion

All right, let us wrap this up.

Let’s first get this out of the way: I love that Intel offers no less than eight overclockable Sapphire Rapids SKUs. When I first heard about overclockable Xeons, I figured they’d have a single halo SKU like with the W-3175X. Having eight is just fantastic.

Second, the overclocking experience of Sapphire Rapids is very common to previous high-end desktop platforms. While it certainly doesn’t feel as nimble as the mainstream desktop, it is still surprisingly easy to squeeze more performance out of the CPU if you pair it with a premium motherboard and cooling. After all, going from 3.3 GHz to 4.9 GHz in all-core workloads is a nearly 50% increase in frequency!

Lastly, there’s still lots to uncover about Sapphire Rapids overclocking. I will try to cover some topics more in-depth in future guides. The next Sapphire Rapids CPU I want to look into is one with a multi-tile die.

Anyway, that’s all for today! I want to thank my Patreon supporters for supporting my work. As usual, if you have any questions or comments, please drop them in the comment section below. 

See you next time!

Liked it? Take a second to support Pieter on Patreon!
Become a patron at Patreon!

Leave A Comment