Months of rumors, leaks, and various discussions online have led to today.
Nvidia has an interesting challenge in front of them for this next generation of GPU. For the first time in years, AMD is offering a competitive top to bottom lineup, or will be once big Navi RDNA2 GPUs start shipping this fall, at least based on rumors, and that RDNA2 intellectual property is the GPU technology powering both of the upcoming next-generation game consoles. Nvidia also is coming off a generation that was impressive in some ways (real-time raytracing, integer math improvements, DLSS 2.0, and power efficient GPUs relative to their prior generations) but was also a disappointment in others (low uptake on raytracing, the awful early implementation of DLSS, and only a small generational improvement in rasterization performance compared to their leap from the Geforce 9xx cards to 10xx).
Given all of that, this has been one of the more anticipated GPU reveal events in recent history. Hype was high that Nvidia would unveil a very high performance raytracing GPU that would offer meaningful performance enhancements over the Turing RTX GPUs that would bring up not just raytraced performance to a playable high standard, but also offer the increases in raster performance that would make people sitting on a Pascal 10-series card (like me and my 1080Ti!) want to upgrade.
And, well…pending benchmarks, I think they’ve done it.
Firstly, the core details: 3 products were announced with staggered launches from September into October. First up is the RTX 3080, a card offering a claimed 2x performance over the 2080, with cards starting at $699 and available on 9/17/2020. Next up in the announcement was the 3070, offering performance just slightly over the 2080Ti at a much lower price of $499 when it launches in October. Lastly, the announcement I was super curious about, they’ve introduced a 3090 as has been heavily rumored, a firmly-Geforce card that Nvidia claims takes some of the role traditionally upheld by the Titan line. The 3090 launches on 9/24/2020 for $1,499 – an eyewatering price, but included in that is 24 GB of RAM as rumored. However, Nvidia was very gunshy on performance specs here, claiming only that it can manage 8k resolution gameplay at 30 frames per second, although what application, raytracing or not, and if that involves DLSS was not disclosed.
The core architectural details are pretty impressive, as expected. The process node shrink is to Samsung’s 8nm as rumored, but Nvidia is claiming this is a custom process Samsung spun up for them. No additional details were offered other than a power efficiency improvement over the last-generation Turing cards of 1.9x the performance per watt. That is a sizable improvement for what was dismissed in enthusiast circles when rumored as a small improvement in process node. It also marks the first time that Samsung has manufactured Nvidia’s high end GPUs – in the Pascal era, Samsung did manufacture low-end GPUs like the 1050 Ti for Nvidia, but never have they made the top end part. Despite the efficiency claim, Nvidia’s performance graph also ominously showed the GPU performance curve for Ampere up to 320w, which tells me that the rumors of a 400w 3090 are pretty likely!
In actual under-the-hood changes, Nvidia is claiming some huge changes and improvements. Their CUDA shader cores have 2x the FP32 throughput, which, coupled with other improvements, is leading Nvidia to claim a 2.7x leap in shader performance. That should mean that even with a relatively small improvement in core counts, Ampere should easily be able to match the Turing series cards and exceed them, which is backed up by the fact that unlike Turing relative to Pascal, all of the GPUs announced today meet or exceed the performance of the top-end 2080Ti, at least according to Nvidia. A new generation of RT cores for raytracing offers 1.7x the performance, as claimed by Nvidia, and the Tensor cores, which are used for the AI denoising done on raytraced scenes and for other compute workloads including some of Nvidia’s software features, are being claimed as offering a 2.7x performance uplift. Without context, it is hard to say what percentage of these improvements comes from a larger amount of hardware in the Ampere chips versus efficiency and throughput improvements to the architecture. For memory, both the 3080 and 3090 are paired with new GDDR6X memory, with the 3090 getting 19.5 Gbps speed and the 3080 getting 19 Gbps flat. Meanwhile, the 3070 is on basic GDDR6 at 16 Gbps, and the assumption we can make for the rest of the lineup is that lower cards like a potential 3060 would use regular GDDR6, with lower cards in the stack potentially dropping to the 14 Gbps used for the top-end Turing cards.
One thing that Nvidia surprised with is the CUDA core counts of the new lineup. Rumors I had seen as late as yesterday still had the 3090 sitting low (okay, “low”) at 5,248 CUDA cores, However, the 3070 alone features 5,888 CUDA cores, with the 3080 at 8,704 and the 3090 at 10,496?! The 2080 Ti had 4,352 by comparison, and even the Titan RTX only (“only”) had 4,608 cores. That represents a massive increase in shader performance, which will translate to direct gains in raster performance in all 3D games, and should ensure faster performance for raytracing as well. RT cores and tensor cores are up, but those numbers tend to matter less for pure gaming performance unless you are using raytracing, which for most readers here, won’t be the case until WoW gets updated to patch 9.0.
With all of those details established, all that is left is to wait for partner board announcements, which have started to come out. My personal favorite add-in board partner for Nvidia cards is EVGA, because they offer useful software tools, aren’t aggressive with reviewers like MSI, and offer the best range of Nvidia cards, including hybrid water-cooled cards at the top end and custom waterblocks for their cards via their Hydro Copper lineup. However, when I went to their RTX 30xx series site today, I was greeted with this…
However, their lowest-end 3090 card, the XC3, has an understated shroud, and while the FTW3 shroud RGB is nice (because you can sync it via an RGB header to match your system) the card has unavoidable red accents! They do show both a hybrid and a Kingpin model with closed-loop liquid coolers attached, which would be my preferred choice, so thankfully both of those are present and look nice and understated.
Other board partners have a variety of models, some images of which I’ve included here, with a variety of aesthetic touches and a lot of them featuring similar cooling designs without the back-side fan, using a front side right fan pushing through a heatsink that has no PCB backing or obstruction.
Something I find very interesting is that pretty much none of the 3090 or 3080 cards (unless water cooled) have less than 3 fans. The 3070 is supposed to be miles ahead in terms of reduced power usage, coming in at 220w, so if you have a cramped case or don’t have the clearance for these, frankly, ridiculously huge cards, it seems like the 3070 wins (especially when given price/performance ratio).
However, the lineup had a series of other, really impressive feature discussions, all relating to software and some interactions that can be accelerated by a fast processor like a GPU. The first was Nvidia Reflex, software designed to reduce input lag and improve response times via driver updates, and coupled with this, Nvidia is working with G-Sync display partners to rollout 360 Hz E-sports displays. The first of them, an Asus monitor, has already been seen in the wild with pretty good reviews. Then, there was Nvidia Broadcast, a software suite designed to use RTX cores for AI tasks like background removal, virtual background effects, webcam auto frame, and the existing RTX voice technologies for noise removal from audio to smooth out your microphone in a noisy environment. Nvidia Omniverse is a machinima technology to allow body tracking via webcam to map to a 3D model for use in content creation, and then to add special effects and such to the scene. The demo used Mount & Blade 2 as the model source, which was pretty cool. What remains to be seen is how this is supported – could you theoretically make WoW or FFXIV machinima with the game’s built-in models, or would it require developer work? Lastly, Nvidia highlighted their Deep Learning efforts, which, for gaming, comes largely via DLSS. DLSS aims to make higher resolutions playable by using a lower resolution frame rendering, but using deep-learning determined sampling methodology from a “ground truth” high resolution image to scale down without a large loss in fidelity. While the first generation of this, introduced with Turing cards at launch, was not great, they’ve since introduced a 2.0 version which uses this supercomputed deep learning algorithm coupled with a temporal anti-aliasing implementation to deliver better-looking and higher performance results, and while I’m not a particularly huge fan of resolution scaling for games, the end result is impressive. Ampere will continue to support these technologies.
However, the software advancement that most appealed to me is RTX IO. The next-generation consoles have a legitimate advantage over nearly every gaming computer in one way – storage throughput. Both Microsoft and Sony are using custom NVME SSDs (either custom logic in the case of Microsoft, or a full custom solution in the case of Sony) that have drastically better performance than a standard NVME SSD for gaming. They accomplish this in a couple of ways – both use aggressive compression to make the most of the space and to allow more data to come over in a unit of time, accomplishing this without burdening the CPU by using custom controller logic that handles the compression and decompression needed. They also, in Sony’s case, use a special methodology for caching file locations and mapping the drive that improves random IOPS performance massively. RTX IO brings at least a portion of this technology to the PC, by using the RTX hardware in the GPU to perform real-time decompression, and it is being developed in collaboration with Microsoft to work with the upcoming DirectStorage API which aims to bring some of the software improvements made on Xbox Series X storage to the Windows PC environment. The thing that I think is really smart about this is that with Ampere cards in particular supporting PCI-E gen 4, you could use half the lanes on Gen 4 to get the same speed as a full x16 PCI-E Gen 3 link, and then have 8 lanes of PCI-E Gen 4 to the graphics card that could be used to link to storage, allowing for the equivalent bandwidth of two PCI-E Gen 4 NVME SSDs. If managed well, this could have some incredible performance improvements to offer – as Nvidia is claiming that it can offer a read bandwidth of around 14 GB/s! Further, if this allows for compression to happen, then an NVME drive could make sense as a bulk game storage device on PC instead of being for your preferred titles, depending on how the compression ratio works out. While games are a long way out from needing 24 GB of VRAM, if RTX IO can use GPU memory to store the data and stream it to system RAM as needed, then this would be a fantastic excuse to push people to the RTX 3090 cards (or the rumored double-VRAM versions that AIB partners are said to offer for the 3080 and 3070, with 20 GB and 16 GB of VRAM respectively).
Overall, though? This seems like a really promising lineup that is going to be worth the wait, and for those who didn’t jump at Turing cards in the last two years, it seems like it will be, in retrospect, the prudent move. For me personally? Provided everything falls into line where I want it to, I’m hoping to grab a 3090 (preferentially the EVGA hybrid model) and sit on that for a long time. AMD still has big Navi in the wings and it is said to be launching before the rumored early-November console launches, so it may be worth waiting to see, but some of the rumors I’ve seen of those cards suggest that, while impressive, Nvidia is likely to continue holding the high-end performance crown.
Either way, having more competition is good and if that is what it took for Nvidia to commit to such a huge generational leap, akin to what we used to get regularly? Well, I’m sufficiently happy with that.