4K with ray tracing done properly, and DLSS 3's AI frame generation tech has great potential. It’s just a shame the price is so high.
Nvidia’s latest flagship gaming GPU is finally in our hands, bringing the green team’s latest Ada Lovelace architecture with it. The GeForce RTX 4090 isn’t cheap, though, launching with a price of $1699 US / £1699 GBP. There’s no getting away from the fact that this is a very expensive GPU, and if you’re going to spend that sort of money on a graphics card, then it really needs to deliver on its promise.
In this case, the RTX 4090 needs to be the fastest gaming GPU ever, by a long way. It needs to make 4K gaming in demanding games with ray tracing a real possibility. After all, not even the GeForce RTX 3090 Ti could play Cyberpunk 2077 properly using the Medium ray tracing preset at 4K, even with DLSS on Balanced. Can the RTX 4090 finally make such a goal achievable? The answer is a definite yes.
At Custom PC, we’ve been reviewing the latest gaming GPUs since 2003, and we run a number of grueling benchmarks in order to gauge performance. Our game tests include measuring the frame rate in Cyberpunk 2077, Doom Eternal, and Metro Exodus, all with and without ray tracing, and we also test with Assassin’s Creed Valhalla. For more information, see our How we test page.
The GeForce RTX 4090 is the first GPU out the doors to use Nvidia’s new Ada Lovelace architecture. In this case it’s based on the A1 revision of the AD102 GPU, a 608mm² die containing 76,300 million transistors, and fabricated on TSMC’s latest 4nm manufacturing process.
Sometimes it’s difficult to visualize these numbers when they get so big, but that’s basically 48 billion more transistors than the top-end GA102 GPU from the Ampere lineup – this is a really sophisticated piece of silicon. Not only that, but the RTX 4090 isn’t even a fully-enabled chip – there’s room for at least one more top-end GPU to come out at a later date, based on the same AD102 chip.
Let’s start with the basic structure of the GPU, which sees Nvidia using a similar block layout as before, with the various processors combined into streaming multiprocessor blocks (SMs), two of which make up a texture processing cluster (TPC).
The TPCs are then grouped into graphics processing clusters (GPCs). Fully-enabled GPCs contain six TPCs, but Nvidia can also disable them so that some GPCs only have five, for example. A fully-enabled AD102 GPU contains 12 GPCs, each containing six TPCs (which in turn contain two SMs), giving you 18,432 CUDA cores.
The RTX 4090 only has 11 GPCs enabled, though, and two of them only have five TPCs rather than six. That means the RTX 4090 has 16,384 basic CUDA cores, so there’s still a good bit of headroom for improvement from a fully-enabled GPU in the future.
The AD102 GPU also shared a huge 98MB pool of L2 cache between all the GPCs, compared with just 6MB in the GA102. We’ve seen AMD adopt a similar approach of laying on as much cache as possible with its RDNA2 GPUs, using its Infinity Cache system. In the case of AMD, the headroom provided by this massive increase in the amount of high-speed, quick-access memory effectively negated the need for the main memory interface to get wider than 256 bits.
That’s not an issue for the RTX 4090 anyway. Not only does it have loads of cache, but it also has a 384-bit memory interface. Couple this with a whopping 24GB total of 1325MHz (21.2GHz effective) GDDR6X memory and you end up with a total memory bandwidth that surpasses the 1TB/sec barrier, at 1,018GB/sec.
RTX 4090 ray tracing
Nvidia has also taken this opportunity to revise the design of its RT cores, one of which is found in each SM. You would get 144 3rd-gen RT cores in a fully-enabled AD102 chip, and you get 128 of them in the RTX 4090. Nvidia has focused on several areas in order to accelerate ray tracing performance.
The first is ray-triangle intersection testing throughput which, as its name suggests, is calculating the effect of a ray intersecting with a triangle, which is very computationally expensive when compared to a ray being behind, parallel to or in front of a triangle. Nvidia claims that the 3rd-generation RT core found in Ada Lovelace doubles the throughput of this work compared to the 2nd-generation RT cores found in Ampere GPUs.
Next up is the Opacity Micromap Engine, which is designed to alpha-test geometry on the new RT cores. This process is often used to cut down on the number of triangles in objects used to describe translucent objects or those with complex shapes (Nvidia uses the example of a leaf or a flame) and instead relying on the alpha channel in the object’s texture to capture the shape.
However, this puts strain on the CUDA cores, and Nvidia says that the ability to move this process to the RT cores means ‘developers can very compactly describe irregularly shaped or translucent objects, like ferns or fences, and directly and more efficiently ray trace them.’
Another big jump with ray tracing is what Nvidia calls Shader Execution Reordering (SER), a feature that the company’s CEO Jensen Huang described as being as revolutionary as out-of-order processing on CPUs. The idea behind it is that it tidies up the often chaotic mess that occurs with ray tracing, when various different threads are executing shaders, or code paths within shaders, and all these threads are accessing various memory resources all over the place.
SER is basically a new scheduler that takes all this shading work and reorders it on-the-fly into a uniform pipeline, which is optimized for efficiency and directing work to the nearest resources.
RTX 4090 DLSS 3
Finally, the other big new introduction is Nvidia’s latest form of DLSS, which is a very different beast to the previous versions. Again, we’ll explore this in more depth next month, but the main takeaway here is that DLSS 3 isn’t just using AI to upscale the resolution anymore. The new string in its bow is frame generation, relying on AI to predict how some future frames will look and generate them according to this information, rather than fully rendering them, and massively improving performance.
The process is based on several data inputs, including depth and motion vector information from the game engine to track geometry. There’s also Nvidia’s new Optical Flow Accelerator system, which takes a look at a pair of sequential frames in a game and works out an ‘optical flow field’ between the two frames – how fast the pixels are moving and the direction in which they’re traveling from one frame to the other.
According to Nvidia, the two-pronged attack of resolution scaling and frame generation effectively means that DLSS 3 reconstructs seven eighths of the pixels displayed on your screen. The resolution scaling effectively transforms a 1,920 x 1,080 scene into a 4K scene in one frame, meaning only a quarter of that frame is produced with a standard render, and then frame generation can create the next frame entirely on its own.
RTX 4090 performance
Let’s be frank here, the GeForce RTX 4090 is a phenomenally fast gaming GPU. Let’s start off with 4K performance, as that’s what you really want from a graphics card at this price. At this resolution, it simply battered the competition and its predecessors.
Our game tests include measuring the frame rate in Cyberpunk 2077, Doom Eternal, and Metro Exodus, all with and without ray tracing, and we also test with Assassin’s Creed Valhalla.
Assassin’s Creed Valhalla RTX 4090 frame rate
The RTX 4090’s 108fps average frame rate in Assassin’s Creed Valhalla is way in front of the 75fps from the Sapphire Nitro+ Pure AMD Radeon RX 6950 XT, and this game was previously a stronghold for AMD. One of the benefits of Nvidia’s latest driver update is massively improved performance in this game, and Nvidia is now the clear winner by a long way.
Doom Eternal RTX 4090 frame rate
Play a less demanding game such as Doom Eternal and the frame rate just rockets up. The RTX 4090 clocked up an average frame rate of 414fps in this game (at 4K), while the RTX 3090 Ti could ‘only’ average 256fps. When 4K monitors start to be equipped with super-fast refresh rates, this GPU will see you in good stead. Even adding ray tracing to the mix couldn’t stop it, with the RTX 4090 managing a 276fps average and 179fps 99th percentile frame rate – it looked amazing on our 144Hz AOC U28G2XU 4K test monitor.
Cyberpunk 2077 RTX 4090 frame rate
Cyberpunk 2077 was also a pushover for the RTX 4090. The Medium ray tracing settings have previously been a struggle at 4K, even on the mighty RTX 3090 Ti with Balanced DLSS enabled, but the RTX 4090 not only makes this setting smoothly playable, but also hits our frame rate target (60fps average; 45fps 99th percentile) with Ultra ray tracing and DLSS set to Quality.
Nvidia also gave us access to a pre-release build of this game so we could test DLSS 3, and the results were incredible. We were able to max out all the ray tracing settings at 4K, and the game then averaged a massive 118fps, with an 89fps 99th percentile result. It looked stunning with all the ray-traced lighting eye candy in action, and it was super smooth in play. You would never guess it was based on 1,920 x 1,080 frame data – it looks really sharp.
The only slight downside we spotted were a few perspective artefacts, such as shimmering as you drive over textured road surfaces quickly, but in all honesty you don’t really notice this in action – it’s a price that’s well worth paying for the enormous uplift in visual quality and frame rate.
Metro Exodus RTX 4090 frame rate
The enormous power of the RTX 4090 makes Metro Exodus an easy game to run. Its average frame rate of 130fps at 4K is just phenomenal, being 47fps ahead of the RTX 3090 Ti. Amazingly, it still maintained a 96fps average frame rate at this resolution with high ray tracing enabled, and that’s without any help from DLSS too.
It goes without saying that this card is also really quick at 2,560 x 1,440, although you don’t gain much of a performance benefit of moving down to 1,920 x 1,080 as you start to hit the CPU limit of a lot of games here. Buying this card for gaming at sub-4K resolutions is a false economy, though – it’s really only worth considering for gaming at high resolutions.
RTX 4090 power draw
Aside from the high price, the other downside to the RTX 4090 is its power draw. The Founders Edition we tested uses the new 16-pin connector and comes with a cable to split this into four 8-pin PCI-E power sockets.
The power draw of our Ryzen 9 5900X test system was also highly variable with the RTX 4090 installed and really starts eating Watts when you’re gaming at 4K with ray tracing enabled. Using our standard Metro Exodus 2,560 x 1,440 power draw test, our system maxed out at 535W with the RTX 4090 installed, but this went right up to 619W at 4K.
On the plus side, this is less power than our system drew with the RTX 3090 Ti installed, but the RTX 4090 is still clearly a thirsty customer. You’ll want at least an 850W PSU if you want to run this GPU.
RTX 4090 Founders Edition cooler
We also have to take our hats off to Nvidia for the design of its Founders Edition card. It’s a massive, three-slot brick, but most of that space is occupied by heatsink fins and large fans. Assuming you use it in a case with a standard airflow layout, the flow-through system works well. The GPU temperature peaked at 67.5°C, with a 77.6°C peak hot spot temperature, and the fan noise never became annoyingly loud. The GPU was also consistently able to boost to 2745MHz in our tests – a good 225MHz higher than the quoted boost clock.
RTX 4090 pros and cons
- Amazing 4K ray tracing performance
- DLSS 3 has great potential
- Fastest GPU ever
- Ridiculously expensive
- High power draw
Nvidia GeForce RTX 4090 specs
The GeForce RTX 4090 specs list is:
|Stream processors / CUDA cores
|Max boost clock
|24 GB GDDR6X
|1325 MHz (21.2 GHz effective)
|16x PCIe 4
|1 x 16-pin / 4 x 8-pin
RTX 4090 price
The price of the GeForce RTX 4090 is extremely high, as it’s Nvidia’s flagship gaming GPU, but it’s also incredibly fast.
Price: Expect to pay $1699 USD / £1699 GBP
RTX 4090 review conclusion
The GeForce RTX 4090 shows how 4K gaming should be done. It annihilates the competition and its predecessors, making settings that were previously impossible achievable. We never thought we’d be discussing 240fps+ frame rates at 4K with ray tracing, but the RTX 4090 has brought us to that world.
The new tech it brings to the table is also mightily impressive, with DLSS 3 making demanding games such as Cyberpunk 2077 run at over 100fps with settings that look incredible. This tech is already lined up to be supported in several big titles, including Microsoft Flight Simulator and A Plague Tale: Requiem, and if it ends up being incorporated into many games then this will be a game-changer for Nvidia, as it really opens up high frame rates in demanding games with ray tracing.
This flagship card isn’t for everyone, of course. Few people will be able to afford one, but Nvidia has shown that it’s once again the king of the castle at the very top end. If you have plenty of cash in the bank and want to play games at 4K, this is undoubtedly the card to get. If you’re looking for a graphics card that isn’t so enormously expensive, then check out our best graphics card guide.