Pascal GTX 1080 Async Compute Explored

Samuel Wan May 16, 2016Last Updated: May 16, 2016

20 504

Aysnch Compute Async Shaders AMD

Last week, a report came out suggesting that Pascal may include improved asynchronous compute support. However, Nvidia also claimed asynchronous compute support with Maxwell but that proved to be less than optimal solution. From the leak of the GTX 1080 slide deck, we’re now able to glean a few more details about what Nvidia has meant about async compute support and how it has been improved with Pascal.

Async compute basically means working on a graphics workload at the same time as a compute one, making the most of the GPU’s resources. This can work on a number of levels, either at the GPU level or SM/CU level. Maxwell only worked at a GPU level, assigning each SM to either a graphics or compute task. Scheduling was done in the shader and in order add/switch tasks, the previous task had to finish or be preempted/stopped. Furthermore, Maxwell only had static partitioning, so graphics and compute tasks scheduling at the same time had to both finish and weren’t able to dynamically reallocate resources if one task finished first. This led to GCN leading when it came to async compute.

Nvidia Pascal GTX 1080 Dynamic Load Balancing async

Pascal brings a number of changes. First off, the promised improved preemption has come. Pascal will be able to offer more fine-grained control, entering a new task between pixels or instructions. This will allow for better and likely faster preemption. The next change is dynamic load balancing. This allows Pascal to reallocate resources dedicated to either graphics or compute on the fly. This GPU level means that once a graphics/compute task is finished, the idle SMs can now be added to those working on another graphics/compute task, speeding up completion. This should allow for much better async compute performance compared to Maxwell.

Nvidia Pascal GTX 1080 Preemption Async

Even with all of these additions, Pascal still won’t quite match GCN. GCN is able to run async compute at the SM/CU level, meaning each SM/CU can work on both graphics and compute at the same time, allowing even better efficiency. Nonetheless, Pascal is finally offering a solution with hardware scheduled async compute, bringing Nvidia closer to AMD. Either way, with both Nvidia and AMD working on async compute, developers are more likely to take notice and make sure our GPUs are fully utilized.

Samuel Wan

Samuel joined eTeknix in 2015 after becoming engrossed in technology and PC hardware. With his passion for gaming and hardware, tech writing was the logical step to share the latest news with the world. When he’s not busy dreaming about the latest hardware, he enjoys gaming, music, camping and reading.

12John34 says:

May 16, 2016 at 10:30 am

So they still don’t have it. At least we will not see Nvidia cards running slower with that feature enabled, something that it is good considering that Nvidia can be very per$ua$ive when asking developers to not use a feature, or just use an older OpenGL version when a game(call me Doom) runs on competitor’s cards.

Reply
1. dmac says:
  
  May 16, 2016 at 11:46 am
  
  but muh nvidia are perfect u dirty fanboi. lol, they won’t do dat mean thing, amd are just butthurt becuase they slo
  
  and yes, sarcasm
  
  Reply
2. Jeffrey Byers says:
  
  May 16, 2016 at 3:07 pm
  
  Carmack has already stated that the OPENGL v4.3 build that was tested for AMD had extensions from 4.4 and 4.5 versions and that there was absolutely NO performance difference had they gone fully v4.5.
  We’re also talking beta, and the build will eventually properly support VULKAN so that should leverage ASync Compute advantages for GCN relative to Maxwell. Regardless, they’ve shown the GTX1080 getting between 120 and 200FPS at 1080p, max settings so the game can run great at 60FPS on much lower hardware.
  Even a GTX750Ti should run the game quite nicely and still look great at lower settings.
  
  Reply
  1. 12John34 says:
    
    May 16, 2016 at 5:35 pm
    
    A few years ago Ubisoft was saying that DX10.1 was buggy and wasn’t offering any performance gains. AMD cards where the only cards offering DX10.1 back then. This isn’t the first time I am reading excuses like these.
    
    Reply
3. lon3wolf says:
  
  June 5, 2016 at 1:36 pm
  
  I thought Doom used GL 4.5 and AMD were using 4.3?
  
  Reply
Jeffrey Byers says:

May 16, 2016 at 2:53 pm

Until we get real benchmarks with fully working DX12 drivers for a larger number of titles we can’t be too certain of anything.
Even if AMD’s ASync Compute is better we don’t know how much, or if NVidia’s approach is better for other scenarios.
In the end most discussions are academic with BENCHMARKS being the important metric. FPS vs dollar, and the overall software ecosystem determine what the intelligent consumer will choose.
I’m choosing NVidia, but I really want AMD to succeed. They’ve been investing heavily in their drivers over the last year, though not much in terms of minimizing DX11 CPU overhead. Freesync is now cheaper and similar to GPU though with the caveat the async range is 2.5X or greater(i.e. 30Hz to 75Hz). A large number of Freesync monitors don’t work as well as GSYNC (30Hz to 60Hz, or 40Hz to 75Hz). GSYNC has no such issue.
NVidia has put a large investment into VR support. Hopefully AMD offers similar FPS and other improvements to what NVidia offers (up to 60% FPS improvement in VR). It’s unclear whether NVidia required new hardware to accomplish this or whether the software would benefit the GTX900 series (GTX970 and better is VR ready).
AMD may offer a VR-ready GPU (Polaris 11) for as little as $150USD! I’m guessing based on the fact that the cheapest VR-ready GPU currently is a factory overclocked R9-380X which starts at $200USD. Presumably AMD wants something that just meets the VR minimum spec for the cheapest price possible so they can put “VR Ready” as a selling point (of course the HUD is comparatively a lot more expensive right now).

Reply
1. RV says:
  
  August 14, 2016 at 1:07 am
  
  Unfortunately most folks are confusing Explicit-multi Adaptor with Asynchronous Shader Pipelines and Asynch Compute Engines.
  NVidia does not support Asynch Compute in hardware. The best that they can do is emulate it in software or drivers and that is not working out to well for them.
  
  Reply
Matthew Beckett says:

May 17, 2016 at 8:29 am

*Sigh* I was talking to a couple of AMD fan boys who were saying when asynchronous compute is enabled on all DX12 games that AMD will destroy Nvidia.

This will not happen and had not happened and here’s why, assuming we all know how asynchronous computation works if not Google it. It basically means that more of the resources can be shared between graphics and compute loads in real time.

If you imagine a grid of that 32 blocks, each block is a generic compute engine capable of running either a graphics or a compute task, now cross out 16 of those blocks, the non-crossed blocks represent working engines in an ASync off scenario. Now imagine the same 32 block grid but it’s an Nvidia core. Now cross out about 6, say the remaining non-crossed engines are the compute load. Notice there are more threads or engines working in this scenario this is because Pascal and Maxwell are basically tapped out and running at 80-90% efficiency. GCN is not running anywhere near as efficient hence the huge claimed performance gains when DX12 was tested.

This only brings them on par with a tapped out Maxwell card, meanwhile Nvidia are making advancements and innovations at a hardware level allowing them to only gain small increases in performance over previous generations.

Once it levels out AMD’s performance increases will slow to the same rate as Nvidia when all the software potential of GCN is tapped. Which funnily enough is not all that much.

Reply
1. RV says:
  
  May 22, 2016 at 7:33 pm
  
  “Which funnily enough is not all that much.”
  Perhaps you would care to elucidate? Or are you just throwing words?
  
  Reply
  1. Aron Lee says:
    
    August 11, 2016 at 10:39 pm
    
    He’s just throwing words… AMD’s potential?
    gtx1060 overclocked couldn’t even beat rx470 reference in Doom Vulkan… shows how bad AMD potential is… kappa
    
    Reply
2. Ace66696 says:
  
  July 5, 2016 at 8:27 pm
  
  “It basically means that more of the resources can be shared between graphics and compute loads in real time.”
  That part is true, but it’s more like
  
  It basically means that more of MULTIPLE resources and WORKLOADS can be shared between graphics and compute loads in real time
  
  Reply
  1. RV says:
    
    August 14, 2016 at 1:02 am
    
    “This will not happen and had not happened and here’s why, assuming we all know how asynchronous computation works if not Google it. It basically means that more of the resources can be shared between graphics and compute loads in real time.”
    Actually the moron has it wrong. The above is called Explicit Multi-Adaptor. That is a function also of DX12 that shares available GPU resources. It is NOT a function of Async Compute which is AMD Hardware IP.
    Asych Compute Shader pipelines allow the CPU parallel cores to send data to the GPU when the data is ready, not serially.
    Explicit Multi-adaptor is a huge up-grade from the concept of Crossfire or SLI. Multi gpu cards now contribute almost a 100% increase performance.
    This is why running DX12, 2 RX 480 8gb cards can outperform 1 GTX 1080 8gb card.
    the 2 RX 480 cards contribute together about 10.8 Tflops of performance vs GTX 1080 of about 9 Tflops.
    The 2 RX 480 cards also have 16gb vram vs 8 gb vram for GTX 1080.
    Explicit-multi Adaptor does not require ANY coding by the game developer. That’s why both AMD and NVidia cards will not work together if the are mounted on the same motherboard.
    This is why NVidia disabled 3-4 way SLI. Rather than spending money on expensive cards, consumers can simply add another inexpensive card and achieve about a 91% increase in performance over the old card if they re the same.
    That’s why it makes sense to buy 3 RX 480 cards and completely shatter GTX 1080 performance.
    This scares the hell out of NVidia and has all the fanboy media hacks trying to trash it.
    See for yourself. Just buy 2 RX 480 and you will get monster performance for about $460!!!
    
    Reply
3. Matthew Benn says:
  
  August 31, 2016 at 9:40 am
  
  Well a year old Fury X is beating the 1070 in a few new games involving async compute so…
  
  Reply
4. Fox Tree says:
  
  March 2, 2017 at 1:15 am
  
  So why radeon performs so good in Vulkan? We are talking about games here, aren’t we?
  
  “Once it levels out AMD’s performance increases will slow to the same rate as Nvidia when all the software potential of GCN is tapped.” – till then Nvidia is way behing AMD in implementing async.
  
  Reply
Kirby667 says:

July 6, 2016 at 9:13 am

SO OK, lets say this is just AMD catching up because they had such poor GPU utilization efficiency before this. That will at least make Nvidia look bad as they keep selling 1070/1080 cards for upwards of $400 and $700 with AMD offering 2/3rds the performance at less than 1/3rd the price… I prefer Nvidia myself (not even sure why anymore) but there is no denying that AMD is miles ahead on price/performance.

The 1080/1070s are selling like crazy and I suspect it’s because a lot of people are expecting full DX12 support and Nvidia could be facing a lot of backlash in a few month when it becomes clear what the limitations of Pascal are.

That said, I don’t think many people are aware that Async Compute is not a feature of DX12 at all. It’s merely the access to lower level hardware that enabled devs to use Async Compute. Async Compute was there all along, DX12 did not add it but simply exposed it. So essentially, Async Compute is just something that AMD cards can do in DX12 because devs have access to lower level hardware features. OK GREAT. So instead of trying to compare that exact feature on Nvidia GPUs, why not talk about what other, perhaps equally beneficial, hardware feature get exposed by DX12 for Nvidia GPUs? Is it just that Nvidia cards were already so efficient that they just don’t have anything left to squeeze out from better low level access? If that is the case, wouldn’t that mean that AMD is actually way ahead in GPU performance seeing as they have been getting by with cards that are essentially no where near max efficiency?

Reply
1. AbsoluteGenocide666 says:
  
  July 31, 2016 at 4:01 am
  
  no, raw power > async .. and always will be .. 1070/1080 users doesnt need to care about async at all.. that raw power of these cards are insane .. and always will be on top of the benchmarks… in next 2 years.
  
  Reply
  1. Matthew Benn says:
    
    August 31, 2016 at 9:39 am
    
    So the raw power thing explains why the Fury X beats the 1070 in some games?
    
    Reply
  2. Daniel Liljeberg says:
    
    December 1, 2016 at 8:54 am
    
    Raw power is something where AMD has almost always been better than Nvidia in every single generation. Looking at the performance numbers and the theoretical TFLOPS produced by cards AMD has had the upper hand a lot of times. But due to their lackluster DX11 drivers a lot of it was “lost” in the overhead. So no, that is not all that matters. Believing something is that simple indicates you have much to learn about hardware and software development.
    
    Reply
  3. Dex4Sure says:
    
    December 30, 2016 at 9:37 am
    
    In VR Async compute is very important… Nvidia is going to struggle if they do not adapt to it. Sure they are the Apple of GPUs right now with tons of fanboys around them. But that does not make them better. And anyway Nvidia has never had the raw performance advantage… AMD cards are more powerful, while Nvidia has had superior drivers and optimization. But if you look back at older gens, every single older gen of AMD is beating their Nvidia’s older gen counterparts.
    
    Reply
  4. Arran McDonald says:
    
    March 1, 2017 at 1:08 pm
    
    Wrong thats not true at all, async allows you to do more tasks faster so a slower GPU with async could potentially be faster at rendering than a faster GPU that can only process one thing at a time, for DX12 hardware async compute is part of the spec & NVIDIA have yet to do that as they do all the async in software via the driver and that can add more overhead to the whole thing.
    
    Reply

✨ We've just launched our NEW website design!

Read Next

Nvidia Launches GeForce RTX 5060 and 5060 Ti

CyberPowerPC Launches Amazing Easter Gaming PC Deals

PlayStation 5 Price Increase in Select Markets Starting April 14

Trump Temporarily Removes New Tariffs on Mobile Phones, Monitors, and Some Electronics

Related Articles

20 Comments

Leave a Reply Cancel reply

Noctua NH-D15 G2 CPU Cooler Review

INNO3D RTX 5080 X3 Graphics Card Review

Marvel’s Spider-Man 2 Update 7 Released

Adblock Detected