Monday, December 4, 2017

How to save energy on gaming: Experiments with lowering power target on the GeForce GTX 1080 Ti

Before I begin, I'd like to provide some relevant background information.

My bedroom has a tendency to trap heat, and is usually the warmest room in the house. It also faces west, which means that in the summer months, the room can get unbearably hot unless the central AC system is turned on full-blast, complicated by the fact that there is only one HVAC register in the room. With the Astaroth desktop's power draw often exceeding 400W under full gaming load, converting most of this energy into heat, keeping the room at a reasonable temperature during long gaming sessions can be challenging.

This led me to the question: How do I get the system to run cooler and use less energy without unacceptably degrading game performance? Read on for the answer...

Optimizing power target and clocks for maximum efficiency

Quite often, the same tools used to boost performance through overclocking can just as well be used to reduce power usage. Most NVIDIA graphics card overclocking tools allow you to change the power target. The power target is a percentage value, expressed in terms of the graphics card's nominal total board power (TBP), that specifies the maximum amount of power the card can use. (AMD graphics cards have a similar setting called power limit, which is expressed as a percentage difference from the nominal TBP.) The card will throttle down as required to keep power consumption within the power target. Overclockers will typically raise the power target to the maximum allowed by the card to permit higher clock frequencies, but we can also lower the power target, often by a large degree, to reduce power consumption and heat output.

The reference GeForce GTX 1080 Ti has a 250W TBP. The EVGA FTW3 Elite variant of the card, as used in Astaroth, has a TBP of 280W. The allowed range for the power target on this card is between 44.6% (125W) and 117.9% (330W):


Throughout this post, I test performance in Assassin's Creed Origins, running at 1440p on ultra settings. Dynamic Resolution is set to maintain a minimum of 30 fps with an uncapped maximum frame rate unless otherwise indicated. The actual clock frequencies attained at any given power target can vary greatly with the workload. Higher resolutions, more detailed scenes, and more complex effects can cause a graphics card to draw more power even at the same clock frequency or reported GPU utilization, which means that you may get results that are very different from what I report here.

To begin, I set the power target to the minimum of 44.6%. I get typical clock frequencies of about 1550-1650 MHz while exploring central Alexandria in Assassin's Creed Origins, which is well below the typical 1900-1950 MHz typical of this graphics card. In more complex scenes, clock frequencies can drop to 1400-1500 MHz or lower. Furthermore, frame delivery is very inconsistent, resulting in significant amounts of stuttering even on a G-SYNC display. The frame rate limiter feature can help; capping the frame rate to 60 fps helps it run smoother, but the clock frequencies are can still get too low to maintain that frame rate and stuttering can still occur. Total power draw at the wall, as measured by a Kill A Watt P4400 meter, is about 270W. (At stock clocks, the power draw can be anywhere from 350W to 400W or more depending on the scene.)

Next, I tried reducing the memory clock frequency. The FTW3 Elite card is unique in that it has 12 Gbps graphics memory; most versions of the GeForce GTX 1080 Ti use 11 Gbps memory. The memory consumes a significant amount of power, and the slower GPU core will not need the full memory bandwidth available, so lowering the memory speed will free some of the power budget to let the GPU core run faster. EVGA Precision XOC (and most other GPU overclocking tools) reports memory frequency as one-half of the transfer rate, and the minimum adjustment for memory speed is about -500 MHz, which means that the memory can be set to run at 11 Gbps.

At this lower memory speed, power-limited GPU clocks at the same 44.6% power target went up a bit, hovering around 1600-1700 MHz, sometimes dropping down to 1500-1550 MHz. This visibly helps, but it isn't quite enough to ensure a smooth experience, with some stuttering still noticeable. Using the FPS limiter to stabilize frame delivery is still a good idea at this point.

I then experimented with slightly higher power targets, surmising that the GPU was operating below its optimal voltage and power level. Increasing the power target to 50% or 140W, just 15 watts higher than before, smoothed out most of the remaining stutter, and the GPU ran at around 1700-1800 MHz in the same Alexandria region, with occasional excursions down to about 1600-1650 MHz. Power consumption is now closer to 285W.

Quantitative efficiency

At the minimum power target of 125W with the memory speed reduced, the GPU averaged about 1500-1550 MHz in more intense scenes. To calculate the computational power of an NVIDIA GPU in single-precision FLOPS, we multiply the clock frequency of the GPU by the number of CUDA cores, times two. At 1500 MHz, with 3584 CUDA cores on the GeForce GTX 1080 Ti, we get 10.75 TFLOPS, or 86 GFLOPS per watt at 125W. At 140W and 1600 MHz, we get 11.47 TFLOPS, or 82 GFLOPS per watt.

In comparison, a GeForce GTX 1080 (non-Ti) with aftermarket cooler can typically operate at up to 2000 MHz at 180W and has 2560 CUDA cores, thereby achieving 10.24 TFLOPS. This translates to 57 GFLOPS per watt. The GTX 1080 can certainly do better than that with a lower power target, but unfortunately, I don't have one to test.

Silicon has a tendency to require disproportionate increases in power as you increase clock speed. As one increases clock frequency, higher voltage becomes necessary to enable transistors to switch faster and maintain stability. Power draw increases linearly with clock frequency, but quadratically with voltage, so the efficiency of a processor drops dramatically at higher clocks. For massively-parallel workloads like graphical rendering, this means that more cores are almost always more efficient than faster cores. This experiment proves that a high-end graphics card with more shader cores can be cranked down to deliver performance comparable to a lesser card while delivering higher efficiency.

No comments:

Post a Comment