Techspot: AMD Polaris 10 performance will reportedly be on par with Radeon 390/390X

**kail** · 2016-06-20, 12:54 AM

So after reading this article, http://wccftech.com/amd-rx-480-1500m...ool-voltage-2/, not entirely sure if this claim should be taken with a salt grain or not.

**mrgreenthump** · 2016-06-20, 01:25 AM

Originally Posted by kail

So after reading this article, not entirely sure if this claim should be taken with a salt grain or not.

Well, Nvidia increased their clocks by 50% when going from 28 nm to 16 nm FinFET.. So how couldn't AMD to reach the same?

**Zenny** · 2016-06-20, 06:22 AM

Originally Posted by Dukenukemx

Async Compute is still broken. Any game that uses it, will decrease frame rates on 1080. Also Radeon GCN cards were already able to use DX12 like 3 years ago. What technology leap is needed? Also HBM memory.

Async Compute is not broken, Geforce 1080 does get perfromance benefits from both DX12 and async, just not to the extent of Radeon cards. Likely due to having much better DX11 performance. Radeon cards have not been fully DX12 compliant either, none of the previous cards supported DX12_1, support for that is only been added in Polaris.

Compared to Maxwell cards, it doesn't overclock as much %. Nobody knows anything about AMD's Polaris card.

Your average aftermarket 980 got around 30% over the base clock, the average aftermarket 1080 gets around 25%-30%, massive difference there.

We know the 1080 is the fastest, but when you turn on DX12 is runs slower compared to DX11, just like Maxwell. It's Maxwell 2.0.

But it doesn't.

Originally Posted by Remilia

Architecture wise, personally the more interesting ones I've seen are the ALU level boost/power down, 'multi-threaded' ALU patents and the hardware primitive discard from Polaris. Not sure what's going to happen with the former two, we'll see on that but the latter one is going to make some scenes extremely lopsided to Polaris because of the nature of it.

"hardware primitive discard" is AMD finally meeting the DX12_1 spec, that Nvidia has had for ages. It allows Polaris to do Conservative Rasterization.

**Remilia** · 2016-06-20, 06:51 AM

Originally Posted by Zenny

"hardware primitive discard" is AMD finally meeting the DX12_1 spec, that Nvidia has had for ages. It allows Polaris to do Conservative Rasterization.

That's not what hardware primitive discard is. Conservative rasterization is something that Maxwell brought, not before. Primitive discard is essentially discarding unseen objects and especially tessellation. For example the tessellated ocean water underneath the ground in Crysis 2, will not ever enter the rendering pipeline. That means if you stare at a wall with a load of people behind that wall, the GPU only render the wall. Provided the software side does not have anything of that sort (which is slow and can lead to pop in), then that means the objects behind the wall will go through the rendering pipeline before being rendered, essentially the GPU can do less work while doing the same work, so to speak. No other hardware architecture has this.

and now I go on a 14 hour or whatever flight, yay.

**dadev** · 2016-06-20, 09:51 PM

Originally Posted by Remilia

If DX12 only just removed CPU bottleneck then performance delta between 970 and 390 for example wouldn't be that big, but for GCN that's not particularly true due to the fact that in DX11 it's extremely underutilized by the serial nature of DX11 where as GCN is parallel in nature. DX12 allows for better parallel processing for graphics and compute which is where GCN able to be utilized better.

What serial nature? It was entirely AMD's choice to not implement DC support in the driver. DX11 parallelises well enough, even with Microsoft's emulation. Not as well as 12, but well enough to issue plenty of draw calls. Exactly what part of DX11.1 can't be parallelised on NVidia hardware? Even on AMD it's 90% fine. Besides, the draw calls issue has nothing to do with parallelisation, but with the overhead of recording the state and arranging all barriers/transitions.

Originally Posted by Remilia

That means if you stare at a wall with a load of people behind that wall, the GPU only render the wall. Provided the software side does not have anything of that sort (which is slow and can lead to pop in), then that means the objects behind the wall will go through the rendering pipeline before being rendered, essentially the GPU can do less work while doing the same work, so to speak. No other hardware architecture has this.

Doing per-object visibility culling is a lot better on the CPU side with efficient data structures rather than wasting precious GPU resources for bruteforcing the problem. Only exception I can think of is in case of tessellation, when only part of the tessellated object is not visible for whatever reason. So you can save a part of your tesellator resources by not tessellating everything. Either way, tessellation resources are abundant when compared to shading resources (or filling the gbuffer), and luckily early-z was introduced about a decade ago (maybe more), with that you can save all the really expensive resources.

**Evildeffy** · 2016-06-20, 10:50 PM

Originally Posted by dadev

What serial nature? It was entirely AMD's choice to not implement DC support in the driver. DX11 parallelises well enough, even with Microsoft's emulation. Not as well as 12, but well enough to issue plenty of draw calls. Exactly what part of DX11.1 can't be parallelised on NVidia hardware? Even on AMD it's 90% fine. Besides, the draw calls issue has nothing to do with parallelisation, but with the overhead of recording the state and arranging all barriers/transitions.

A draw call can only be utilized on a single CPU core, this is what DX11 cannot accomplish to spread across multiple cores and what DX12/Vulkan/Mantle can, imagine how JUST this feature would benefit WoW in Mythic raiding environments.
The point here is that everyone WANTS draw calls to be parallelized as well as driver overhead reduction so we, as consumers/professionals, can actually 100% utilize the hardware that we buy to do the tasks that we want/need to do.
Also nVidia's hardware for parallelization is actually based on pre-emption, so it predicts and acts .. which can work fine as witnessed by prior cards.
But when more horsepower is required for whatever reason you need multiple data and render pipelines handling things with an actual hardware scheduler.

nVidia's hardware lacks this hardware scheduler and has 31 render pipeline with only 1 tiny data pipeline vs. the R9 390X's 64 of which it's 32/32 in data/render pipelines.
So nVidia doesn't truly parallelize and will actually leave GPU resources untapped, less than AMD's DX11 utilization but when DX12 is involved the full brunt of AMD's hardware is called into action vs. the same amount that nVidia had on DX11.

nVidia's lack of this ability to do so is also exactly why the Oculus Rift devs quoted that nVidia's VR is "potentially catastrophic".

Originally Posted by dadev

Doing per-object visibility culling is a lot better on the CPU side with efficient data structures rather than wasting precious GPU resources for bruteforcing the problem. Only exception I can think of is in case of tessellation, when only part of the tessellated object is not visible for whatever reason. So you can save a part of your tesellator resources by not tessellating everything. Either way, tessellation resources are abundant when compared to shading resources (or filling the gbuffer), and luckily early-z was introduced about a decade ago (maybe more), with that you can save all the really expensive resources.

Actually no if you let the culling occur on the CPU it means the CPU has to analyze the component that requires to be drawn, determine if it's visible and then discard it.
Where a GPU's resource is already asked to do so and when doing it checks if it's required to do so.
If not it drops it entirely and gives the far less weighted rendering time to other tasks (again coming into parallelization here) and offers the lighter frame to the user instead.
Precious time and resources are saved and CPU cycles aren't wasted as it takes more effort for a CPU to do so separately when a GPU is having to do it naturally anyway.
The part you're missing in this is that nVidia goes overboard with said tesselation to cripple performance on competitor cards, Crysis was a perfect example as you literally had an ocean beneath you being rendered whilst it was in the middle of a city... why would the ocean require rendering?
Or that a simple concrete block was more complex than most actual moving character models.

Old techniques no longer fully apply and that's why they are developed/renewed.

Whilst some of your post isn't wrong it does seem woefully out of date with the current state of affairs.

**dadev** · 2016-06-21, 02:44 PM

Originally Posted by Evildeffy

A draw call can only be utilized on a single CPU core, this is what DX11 cannot accomplish to spread across multiple cores and what DX12/Vulkan/Mantle can, imagine how JUST this feature would benefit WoW in Mythic raiding environments.

What does that even mean? I can issue draw calls from multiples cores and lo and behold, they all happen at the same time with NVidia driver. This has nothing to do with the actual hardware. It's just recording of command lists, same thing that you do with DX12, only on programmer level instead of driver level. AMD opted to not implement it in the driver, it was their choice and 100% their fault. What more do you want?

The point here is that everyone WANTS draw calls to be parallelized as well as driver overhead reduction so we, as consumers/professionals, can actually 100% utilize the hardware that we buy to do the tasks that we want/need to do.

And they can! What proof you have that this doesn't happen? It doesn't happen fully on AMD, but it's their own fault.

Also nVidia's hardware for parallelization is actually based on pre-emption, so it predicts and acts ..

It has nothing to do with the actual hardware! Only the driver. If the driver doesn't support this then Microsoft emulates it. If you have DX12 driver, but no DX11 driver which supports command lists then pardon me but it's your damn fault and laziness.

So nVidia doesn't truly parallelize and will actually leave GPU resources untapped, less than AMD's DX11 utilization but when DX12 is involved the full brunt of AMD's hardware is called into action vs. the same amount that nVidia had on DX11.

Why won't you prove the first part with a balanced sample and a gpuview trace? What is this? AMD marketing machine at work?

nVidia's lack of this ability to do so is also exactly why the Oculus Rift devs quoted that nVidia's VR is "potentially catastrophic".

What does this have to do with the lack support in AMD driver for drawing from multiple threads? In fact, I find that far more "potentially catastrophic", what will all the devs that won't move to DX12 do? Just because AMD don't want to add one feature to the driver everyone has to suffer? Nice tactics.

Actually no

But actually yes!

if you let the culling occur on the CPU it means the CPU has to analyze the component that requires to be drawn, determine if it's visible and then discard it. Where a GPU's resource is already asked to do so and when doing it checks if it's required to do so.

GPU does it in a far worse way if the only question you want to ask "can I see it at all?".

Precious time and resources are saved and CPU cycles aren't wasted as it takes more effort for a CPU to do so separately when a GPU is having to do it naturally anyway.

I have strong suspicions that you don't know much about computational geometry. Natural way, right...

Whilst some of your post isn't wrong it does seem woefully out of date with the current state of affairs.

"Out of date", right, we did have a good laugh from your post here at the office.

**Evildeffy** · 2016-06-21, 03:24 PM

Originally Posted by dadev

What does that even mean? I can issue draw calls from multiples cores and lo and behold, they all happen at the same time with NVidia driver. This has nothing to do with the actual hardware. It's just recording of command lists, same thing that you do with DX12, only on programmer level instead of driver level. AMD opted to not implement it in the driver, it was their choice and 100% their fault. What more do you want?

Actually no you cannot.
Not by using any game opting to use DX11 and not any other Mantle, it is hardcoded into the DX11 subroutines.
What it CAN do is called deffered contexts but it remains slave to the primary CPU core thus if you have a CPU-bound situation, like WoW in this case, you are SoL.
If you read the primary advantages of DX12 and what they can do it is advertised that everything is now parallel capable natively and that draw call and other commands are no longer executed serially.
They adhere to the standards set in the Microsoft API in that case even though AMD has deeper support of DX than nVidia ever had.
Draw Calls are on a single CPU thread only in DX11 and below, unsure of OpenGL but i know Draw Calls can be made on any CPU core with DX12/Mantle/Vulkan/Metal.
Or are you aware of something that Microsoft's, nVidia's and AMD's dev teams are not and found the miracle parallelization that no-one else could?

Originally Posted by dadev

And they can! What proof you have that this doesn't happen? It doesn't happen fully on AMD, but it's their own fault.

Nitrous' Star Swarm engine is a perfect example.
http://www.eurogamer.net/articles/di...-a-gamechanger
A link with more info, yes AMD's DX11 driver was/is (Crimson drivers changed the landscape a lot) less efficient, no-one denied that but nVidia still adheres to the limit to DX11's API so that outcome does not

Originally Posted by dadev

It has nothing to do with the actual hardware! Only the driver. If the driver doesn't support this then Microsoft emulates it. If you have DX12 driver, but no DX11 driver which supports command lists then pardon me but it's your damn fault and laziness.

Wrong, the card is based upon pre-emption.
If you even look at nVidia's own slides and tech shows they tell you that their cards can now do Pixel Pre-emption.
Pre-emption, in anyway you look at it, prediction and act upon that. The actual GFX card misses the hardware to allow this thus going into driver emulation.
Driver emulation adds latency and will never be as efficient as hardware dedicated to it, this is why nVidia's VR is called "potentially catastrophic".
Or do you know more than the Oculus Rift devs as well?

Originally Posted by dadev

Why won't you prove the first part with a balanced sample and a gpuview trace? What is this? AMD marketing machine at work?

Compare it to HyperThreading if you will.
So let's say you go with an i5-6600K @ 4GHz and an i7-6700K @ 4GHz, they are physically identical CPUs except that for every core a mask is added to allow for the CPU's resources to be utilized fully, if this weren't the case than there would be 0 difference in running with or without HyperThreading in say HandBrake.
The proof is easily viewed in any number of technologies and no I am not a brand fanboy, I pick to run whatever is strongest at the time.
Though someone can link you the AdoredTV's explanation of the same thing if they know which one it is specifically.

Originally Posted by dadev

What does this have to do with the lack support in AMD driver for drawing from multiple threads? In fact, I find that far more "potentially catastrophic", what will all the devs that won't move to DX12 do? Just because AMD don't want to add one feature to the driver everyone has to suffer? Nice tactics.

Actually every dev has been screaming for low level APIs for years to get where DX12/Vulkan is now, born from Mantle.
It is a free performance gain and latency reducing feature however the devs are free to choose if they want to use any performance addition or not.
Most will not leave the opportunity alone to get free performance, then again we can also view nVidia's DX12 performance which is again based upon pre-emption and STILL misses the hardware to schedule and run parallelization tasks through anything but software.

Originally Posted by dadev

But actually yes! GPU does it in a far worse way if the only question you want to ask "can I see it at all?".

Really?
Describe to me the process of CPU handling that vs. GPU handling that which can drop the object being drawn midway and just show the surface of said object.
Which has more steps whilst maintaining draw calls etc. which need an Interrupt command to stop the CPU in it's tracks analyze, process and chose to continue afterwards?

Originally Posted by dadev

I have strong suspicions that you don't know much about computational geometry. Natural way, right...

As have I suspicions about you not knowing much regarding the actual CPU process cycle of how and what.

Originally Posted by dadev

"Out of date", right, we did have a good laugh from your post here at the office.

I'm sure many can say the identical to your post.

**dadev** · 2016-06-21, 03:58 PM

Originally Posted by Evildeffy

Actually no you cannot.
Not by using any game opting to use DX11 and not any other Mantle, it is hardcoded into the DX11 subroutines.

Absolutely incorrect.

What it CAN do is called deffered contexts but it remains slave to the primary CPU core thus if you have a CPU-bound situation, like WoW in this case, you are SoL.

It is not slave in any way. You have literally no idea what you're talking about. It's slightly a slave with AMD driver, because they don't support it.

If you read the primary advantages of DX12 and what they can do it is advertised that everything is now parallel capable natively and that draw call and other commands are no longer executed serially.

Can do the same with DX11. DX12 advantages are elsewhere.

They adhere to the standards set in the Microsoft API in that case even though AMD has deeper support of DX than nVidia ever had.

Right. Again no idea what you're talking about.

Draw Calls are on a single CPU thread only in DX11 and below

99% wrong again. Gets tiresome. The "below" part is correct.

unsure of OpenGL

And this is where you unsure???

Nitrous' Star Swarm engine is a perfect example.
http://www.eurogamer.net/articles/di...-a-gamechanger
A link with more info, yes AMD's DX11 driver was/is (Crimson drivers changed the landscape a lot) less efficient, no-one denied that but nVidia still adheres to the limit to DX11's API so that outcome does not

So let me get this straight. You link me an article that confirms what I was saying? Game engine scales with multithreading on DX11. Clearly badly because the engine wasn't optimized (performance losses on AMD show that). Why are you even arguing? Multithreading on DX11 can easily double the amount of draw you can submit NVidia or AMD. If they loose on AMD it means they did it wrong. Yes AMD gains less (the whole point here), but they still gain enough, definitely not loose.

Wrong, the card is based upon pre-emption.
If you even look at nVidia's own slides and tech shows they tell you that their cards can now do Pixel Pre-emption.
Pre-emption, in anyway you look at it, prediction and act upon that. The actual GFX card misses the hardware to allow this thus going into driver emulation.
Driver emulation adds latency and will never be as efficient as hardware dedicated to it, this is why nVidia's VR is called "potentially catastrophic".
Or do you know more than the Oculus Rift devs as well?

I don't care about pre-emption! Why you keep shoving it? I know what it is, I know what it does. It's not what I wrote about before. I said that you can multithread with DX11, why you bring in pre-emption?

Compare it to HyperThreading if you will.
So let's say you go with an i5-6600K @ 4GHz and an i7-6700K @ 4GHz, they are physically identical CPUs except that for every core a mask is added to allow for the CPU's resources to be utilized fully, if this weren't the case than there would be 0 difference in running with or without HyperThreading in say HandBrake.
The proof is easily viewed in any number of technologies and no I am not a brand fanboy, I pick to run whatever is strongest at the time.
Though someone can link you the AdoredTV's explanation of the same thing if they know which one it is specifically.

Please please tell me what does this have to do with gpuview trace? If you don't know what that is why not ask? If you do, why not just prove it?

Actually every dev has been screaming for low level APIs for years to get where DX12/Vulkan is now, born from Mantle.
It is a free performance gain and latency reducing feature however the devs are free to choose if they want to use any performance addition or not.
Most will not leave the opportunity alone to get free performance, then again we can also view nVidia's DX12 performance which is again based upon pre-emption and STILL misses the hardware to schedule and run parallelization tasks through anything but software.

The devs that were asking for it wanted it in order to have easier ports from xbone and to bypass awful AMD DX11 drivers. Duh...

Really?
Describe to me the process of CPU handling that vs. GPU handling that which can drop the object being drawn midway and just show the surface of said object.

What? I said per-object culling, not per-triangle culling, not per-pixel culling. Per-object culling. Either the object is entirely dropped or not. No middle ground. And excuse me, but I will not indulge in explaining it to you. You can start by reading BKOS. If you don't know what that means then it's hopeless and it would be the same as someone without any knowledge in physics asking "explain me the field equations please". Not gonna happen.

As have I suspicions about you not knowing much regarding the actual CPU process cycle of how and what.

But I do. Not in design, never was a big fan of VLSI or the whole EE thing, but I know the modern CPU architecture well enough.

I'm sure many can say the identical to your post.

Not the ones that matter.

**Evildeffy** · 2016-06-21, 04:41 PM

Originally Posted by dadev

Absolutely incorrect.

Yes OK, so every developer is wrong on this planet except for you, ok glad we got that straightened out.

Originally Posted by dadev

It is not slave in any way. You have literally no idea what you're talking about. It's slightly a slave with AMD driver, because they don't support it.

Nope, apparently I do not but you do even though other game developers worldwide also do not know.
Are you by any chance a genious that has seen things other AAA-devs have not?

Originally Posted by dadev

Can do the same with DX11. DX12 advantages are elsewhere.

Really? I guess all those years screaming for this very option by devs around the world was just a figment of everyone's imagination.

Originally Posted by dadev

Right. Again no idea what you're talking about.

DirectX is an API (Application Programming Interface) which requires certain hardware functions to be compatible with it as most of the world runs on MS' Windows it uses DirectX (also known as D3D) to propel graphics in that environment.
The terms are dictated by Microsoft and not graphics designers, which is the very reason why Mantle was born.

Originally Posted by dadev

99% wrong again. Gets tiresome. The "below" part is correct.

Must be getting tiresome indeed since somehow you manage to break DX11 to do things it cannot do, you MUST be some super human.

Originally Posted by dadev

And this is where you unsure???

Yes as OpenGL is a different API with different set of rules.
What it can and cannot do is not something I'm familliar with as most PC games do not use OpenGL and have not for a long time.
DOOM (2016) is an exception and it will switch to Vulkan soon enough.
Nothing illogical about this statement.

Originally Posted by dadev

So let me get this straight. You link me an article that confirms what I was saying? Game engine scales with multithreading on DX11. Clearly badly because the engine wasn't optimized (performance losses on AMD show that). Why are you even arguing? Multithreading on DX11 can easily double the amount of draw you can submit NVidia or AMD. If they loose on AMD it means they did it wrong. Yes AMD gains less (the whole point here), but they still gain enough, definitely not loose.

No, again, there's a limit to Draw Calls and they are performed on a single main thread with DX11, there are other jobs which can be assigned to other cores but not Draw Calls.
Tell me if Draw Calls can be done concurrently on DX11 why is it that all DX11 MMORPGs are CPU-bound in performance with all the reasons pointing into Draw Call limitations? Lazy developers right?

Originally Posted by dadev

I don't care about pre-emption! Why you keep shoving it? I know what it is, I know what it does. It's not what I wrote about before. I said that you can multithread with DX11, why you bring in pre-emption?

Because you state that parallelization is nothing but driver-side, which I'm correcting you on as the quote suggests.
I bring it up because that very point intersects exactly with why DX12 on nVidia cards bring no performance gains to the table and why AMD does.

Originally Posted by dadev

Please please tell me what does this have to do with gpuview trace? If you don't know what that is why not ask? If you do, why not just prove it?

Because you're not even looking at the basics, GPUView still looks at the DirectX API thus you're still a level above where you need to be.
Just because something is running @ 100% does not mean you're utilizing all of it's resources, in my example the entire HyperThreading concept was made to maximise all resources a CPU has available on the same core with simultaneous tasks, this is what DX12 looks to employ properly which nVidia cannot do pure and simple.

Originally Posted by dadev

The devs that were asking for it wanted it in order to have easier ports from xbone and to bypass awful AMD DX11 drivers. Duh...

Wow... really? Isn't that a hilarious thing considering that the devs were screaming for it long before then and funnily enough the consoles having AMD GPUs.
Irony at it's best huh? Dear Lord I cannot believe you actually posted this.
I'm hoping it was sarcasm or something and me simply being unable to read it.

Originally Posted by dadev

What? I said per-object culling, not per-triangle culling, not per-pixel culling. Per-object culling. Either the object is entirely dropped or not. No middle ground. And excuse me, but I will not indulge in explaining it to you. You can start by reading BKOS. If you don't know what that means then it's hopeless and it would be the same as someone without any knowledge in physics asking "explain me the field equations please". Not gonna happen.

Oh funsies, what were we talking about again? Oh right "Primitive Discard Acceleration" which is meant to cull exactly what I said to reduce performance loss which funnily enough YOU brought up. So now you want to change it to entire object culling instead of the feature you commented on and even stated "per-object visibility culling".
Your own statement (post #851) shows exactly that you were talking about what others, and I, have mentioned and now you're grabbing the entire object.
Nope, that doesn't work that way.

Originally Posted by dadev

But I do. Not in design, never was a big fan of VLSI or the whole EE thing, but I know the modern CPU architecture well enough.

I have my doubts about that but going into this is an exercise in futility.

Originally Posted by dadev

Not the ones that matter.

Which to you would probably be just yourself.
Enjoy the rest of your day.

I shall regard you as the genious that can do things that AAA-developers are unable to do, a near deity of graphics development(!).
Your word is gospel(!).

**draykorinee** · 2016-06-21, 04:46 PM

Wall of text arguments, great.

**Evildeffy** · 2016-06-21, 04:58 PM

Originally Posted by draykorinee

Wall of text arguments, great.

You have a #TeamPopcorn signature, I'd think you'd love this sort of thing!

**draykorinee** · 2016-06-21, 05:08 PM

Originally Posted by Evildeffy

You have a #TeamPopcorn signature, I'd think you'd love this sort of thing!

Lol it hasn't descended in to name calling or out right abuse yet, you're both being far too to calm and measured.

**Evildeffy** · 2016-06-21, 05:19 PM

Originally Posted by draykorinee

Lol it hasn't descended in to name calling or out right abuse yet, you're both being far too to calm and measured.

Awww... well for me it won't go further than this so no #TeamPopcorn for you then!

**dadev** · 2016-06-21, 05:29 PM

Originally Posted by Evildeffy

Yes OK, so every developer is wrong on this planet except for you, ok glad we got that straightened out.

Every developer? What are you even talking about?
Here, just jump into anecdotes: https://developer.nvidia.com/sites/d...edContexts.pdf

Nope, apparently I do not but you do even though other game developers worldwide also do not know.
Are you by any chance a genious that has seen things other AAA-devs have not?

What other AAA-devs are you talking about? Please show me those mysterious devs who did a dedicated DX11 engine (one studio even went the extra miles to make it for DX9) and said that you cannot parallelize draw submits. This is pure bullshit.

Really? I guess all those years screaming for this very option by devs around the world was just a figment of everyone's imagination.

As I said before, they were asking for this to make ports easier. There are other benefits to DX12, such as state recording and barrier transition/dependency resolve are entirely up to the developer which was the main clutter in DX11 drivers. In simple terms, draw call cost in DX12 is on an order magnitude faster than in DX11, and this has nothing to do with multithreading.

Must be getting tiresome indeed since somehow you manage to break DX11 to do things it cannot do, you MUST be some super human.

You're the one doing the breaking!

Yes as OpenGL is a different API with different set of rules.
What it can and cannot do is not something I'm familliar with as most PC games do not use OpenGL and have not for a long time.

Excuse my bluntness, if you don't know how OpenGL works then what the hell your opinion is worth on DX? Any DX! Why does it matter what games do and don't do? Go read https://www.opengl.org/registry/doc/glspec45.core.pdf
It is completely illogical because you argue about DX11, yet you have no idea how OpenGL operates. A standard that exists for more than 20 years and anyone who touched 3d graphics in this time period knows a thing or two about OpenGL.

No, again, there's a limit to Draw Calls and they are performed on a single main thread with DX11, there are other jobs which can be assigned to other cores but not Draw Calls.
Tell me if Draw Calls can be done concurrently on DX11 why is it that all DX11 MMORPGs are CPU-bound in performance with all the reasons pointing into Draw Call limitations? Lazy developers right?

First, why you are asking me? Ask them! Secondly, engine design for DX9 and/or OpenGL limits multi-threading. It really is a no-brainer.

Because you state that parallelization is nothing but driver-side, which I'm correcting you on as the quote suggests.
I bring it up because that very point intersects exactly with why DX12 on nVidia cards bring no performance gains to the table and why AMD does.

Can it be because of NVidia's DX11 driver is already optimized? The shocker!
If you're talking about async, then sure if it's Maxwell. But that has nothing to do with the driver.

Because you're not even looking at the basics, GPUView still looks at the DirectX API thus you're still a level above where you need to be.

GPUView is exactly where you need to be. Look at the HW queue (as opposed to the SW queue) and see for yourself. Com'on, are you for real? Have you ever seen a trace? HW graphics queue at 99.9%? You're utilizing the gpu, gg. If you have dedicated compute queue, then try to fill it too, but not as much as you can, because even on AMD you will loose performance starting at some point.

Just because something is running @ 100% does not mean you're utilizing all of it's resources, in my example the entire HyperThreading concept was made to maximise all resources a CPU has available on the same core with simultaneous tasks, this is what DX12 looks to employ properly which nVidia cannot do pure and simple.

Neither does HT. HT has the option to utilize more CPU ports if there are available. If there are none then HT will do nothing. Furthermore, HT is entirely automated, you cannot control it in any way except for disabling it. DX12 looks to close the gap between the programmer and the hardware by moving driver responsibilities to the engine programmer, has nothing in common with HT.

Wow... really? Isn't that a hilarious thing considering that the devs were screaming for it long before then and funnily enough the consoles having AMD GPUs.
Irony at it's best huh? Dear Lord I cannot believe you actually posted this.
I'm hoping it was sarcasm or something and me simply being unable to read it.

What irony? Do you think that we were coding in some high level API for PS3? lol!

Oh funsies, what were we talking about again? Oh right "Primitive Discard Acceleration" which is meant to cull exactly what I said to reduce performance loss which funnily enough YOU brought up. So now you want to change it to entire object culling instead of the feature you commented on and even stated "per-object visibility culling". Your own statement (post #851) shows exactly that you were talking about what others, and I, have mentioned and now you're grabbing the entire object.
Nope, that doesn't work that way.

Look at my post, and see what I wrote. Per-object visibility culling. It can mean only one thing. And I gave an exception in case of tessellation because you might wan't to split your tessellated mesh.
I'm not responsible for your reading disabilities. If you want to laugh go ahead, but the joke is on you.

**Evildeffy** · 2016-06-21, 06:21 PM

[QUOTE=dadev;40991074]Every developer? What are you even talking about?
Here, just jump into anecdotes: https://developer.nvidia.com/sites/d...edContexts.pdf
http://www.pcper.com/reviews/Editori...What-Can-It-Do
First bits, says enough right there.

Originally Posted by dadev

What other AAA-devs are you talking about? Please show me those mysterious devs who did a dedicated DX11 engine (one studio even went the extra miles to make it for DX9) and said that you cannot parallelize draw submits. This is pure bullshit.

Does Blizzard Entertainment count? If so see WoW.

Originally Posted by dadev

As I said before, they were asking for this to make ports easier. There are other benefits to DX12, such as state recording and barrier transition/dependency resolve are entirely up to the developer which was the main clutter in DX11 drivers. In simple terms, draw call cost in DX12 is on an order magnitude faster than in DX11, and this has nothing to do with multithreading.

Cool, I'll just jot down "DX12 primarily made to make ports easier" and ignore the whole "Why was Mantle created in the first place", I'm sure that was just for those purposes.
Of course DX12 has multiple advantages and not just 1 or 2, if the only reason why just driver overhead reduction and draw call being natively multi-threaded than DX12 would never have existed.

Originally Posted by dadev

You're the one doing the breaking!

Cool, can I get a large heavy blunt object, preferably made out of titanium, to break more things?!

Originally Posted by dadev

Excuse my bluntness, if you don't know how OpenGL works then what the hell your opinion is worth on DX? Any DX! Why does it matter what games do and don't do? Go read https://www.opengl.org/registry/doc/glspec45.core.pdf
It is completely illogical because you argue about DX11, yet you have no idea how OpenGL operates. A standard that exists for more than 20 years and anyone who touched 3d graphics in this time period knows a thing or two about OpenGL.

Yes for those who develop in them they do, I do not, I've made no secret out of this and stated this.
Have I ever, at any point, called myself a developer?
OpenGL, even though updated through the years, has not seen popular use on any Windows machine where-as DX11 has about what? 99% of the PC market in hands?
OpenGL 4.5, which is the latest iteration and is according to a lot of devs a mess of epic proportions, is not what it was long ago as it also started as IRIS GL.
It is very logical for more people NOT to know the specifics of OpenGL than they do of DirectX, there is also a reason why older versions of OpenGL is referred to as Legacy OpenGL and why it's not recommended to use. OpenGL did evolve over time but it's a mess even by admission of many devs.

Originally Posted by dadev

First, why you are asking me? Ask them! Secondly, engine design for DX9 and/or OpenGL limits multi-threading. It really is a no-brainer.

And yet World of Warcraft is fully DX11 compliant... how about that.

Originally Posted by dadev

Can it be because of NVidia's DX11 driver is already optimized? The shocker!
If you're talking about async, then sure if it's Maxwell. But that has nothing to do with the driver.

Did I not state that nVidia's drivers and hardware are better optimised than AMD's in DX11? That does not change the fact that it is not just driver related.
Hell if you want to go there you can go to Ashes of the Singularity specifically and see their explanation of why multi-threading is inherently worse on nVidia hardware and yes Asynch is the prime example for that but not the original point.

Originally Posted by dadev

GPUView is exactly where you need to be. Look at the HW queue (as opposed to the SW queue) and see for yourself. Com'on, are you for real? Have you ever seen a trace? HW graphics queue at 99.9%? You're utilizing the gpu, gg. If you have dedicated compute queue, then try to fill it too, but not as much as you can, because even on AMD you will loose performance starting at some point.

I return to the statement of just because your GPU working at 100% does not mean all resources are utilized. GPUView is still controlled through the DirectX API and thus has it's limitations.

Originally Posted by dadev

Neither does HT. HT has the option to utilize more CPU ports if there are available. If there are none then HT will do nothing. Furthermore, HT is entirely automated, you cannot control it in any way except for disabling it. DX12 looks to close the gap between the programmer and the hardware by moving driver responsibilities to the engine programmer, has nothing in common with HT.

It partly does as software needs to know what CAN be HyperThreaded and not meaning there's a logic definition behind it. It's just a lot simpler than GPU parallelization as where the GPU has 2000 to 4000 "cores" the CPU has "only" up to 24 right now and a CPU is a serial device with cores being an attempt to parallelize it... barely.

Originally Posted by dadev

What irony? Do you think that we were coding in some high level API for PS3? lol!

The hilarious part about it was the fact you stated and apparently meant it when you said:

Originally Posted by dadev

The devs that were asking for it wanted it in order to have easier ports from xbone and to bypass awful AMD DX11 drivers. Duh...

I'm sure it had nothing to do with having access to the "metal resources" of any GPU.
If you want to get technical Sony's libGCM is the first commonly used low-level API, doesn't mean they JUST use that in consoles.

Originally Posted by dadev

Look at my post, and see what I wrote. Per-object visibility culling. It can mean only one thing. And I gave an exception in case of tessellation because you might wan't to split your tessellated mesh.
I'm not responsible for your reading disabilities. If you want to laugh go ahead, but the joke is on you.

Did you or did you not respond to the "Primitive Discard Accelerator" post? If so how does that magically transform subject?
Before you want to accuse someone of "reading disabilities" you might want to look into comprehension of topics first.

**dadev** · 2016-06-21, 09:35 PM

Originally Posted by Evildeffy

http://www.pcper.com/reviews/Editori...What-Can-It-Do
First bits, says enough right there.

And what does it say? That you couldn't parallelize on DX10 and earlier? DX11 is fully parallelizable? That the gains in DX12 are from reduced overhead in the driver and have little to nothing to do with multithreading? Wait, I'm almost sure I wrote it before! It's as if I did!
Looking at the second chart (because AMD...), going from one core to 2 cores, DX11 nets 2.08 scale factor, DX12 nets 1.7. For four cores DX11 gains 3.46, DX12 is on 2.97. Going on, 8 cores, DX11 gains x4.5, DX12 gains x3.7. It's like magic! DX11 scales better than DX12! Even better than AMD DX12 (or Mantle). What's going on?!

Does Blizzard Entertainment count? If so see WoW.

Are you trolling me? WoW has an engine which was built around DX9 and OpenGL. DX11 backend was coded in much later on.

Cool, I'll just jot down "DX12 primarily made to make ports easier" and ignore the whole "Why was Mantle created in the first place", I'm sure that was just for those purposes.
Of course DX12 has multiple advantages and not just 1 or 2, if the only reason why just driver overhead reduction and draw call being natively multi-threaded than DX12 would never have existed.

Why is it so hard to understand? Multithreading has nothing to do with it. Only real advantage DX12 brings in terms of performance (except feature level 12 stuff) is async and low overhead because driver responsibilities moved to engine programmer.

Yes for those who develop in them they do, I do not, I've made no secret out of this and stated this.
Have I ever, at any point, called myself a developer?

No, but not even once you said, "I'm not sure how this works", "I'm not sure why this is made this way". So what am I left to assume? In contrast I've been doing this for almost 20 years (since glide), I've seen APIs/technologies/paradigms shift, come and go. I do have experience.

OpenGL, even though updated through the years, has not seen popular use on any Windows machine where-as DX11 has about what? 99% of the PC market in hands?
OpenGL 4.5, which is the latest iteration and is according to a lot of devs a mess of epic proportions, is not what it was long ago as it also started as IRIS GL.

But OpenGL 4.5 is good! Alas too little, too late. Still no multithreading support and no good debugging tools, but otherwise it's a good API. Shame it was about 7 years late to the party.

It is very logical for more people NOT to know the specifics of OpenGL than they do of DirectX, there is also a reason why older versions of OpenGL is referred to as Legacy OpenGL and why it's not recommended to use. OpenGL did evolve over time but it's a mess even by admission of many devs.

It's not logical! OpenGL spec is for everyone to read in a conscious manner, whereas DX11 programmer spec is non existent + MSDN is a mess. In what way the specifics of DX are more clear? The marketing machines at play do not make it more accessible or clear, most of the time they achieve the opposite but throw away nice buzzwords.

And yet World of Warcraft is fully DX11 compliant... how about that.

Compliant engine is not a dedicated engine. In general case engine built around DX9 can be ported to DX11 without too much fuss, but the move backwards will likely be complicated, especially if you opted to use the "advanced" features. It's like porting DX11 stuff to DX12, games do it successfully (even if the port sucks), but the port backwards can turn out to be quite complicated.

Did I not state that nVidia's drivers and hardware are better optimised than AMD's in DX11? That does not change the fact that it is not just driver related.
Hell if you want to go there you can go to Ashes of the Singularity specifically and see their explanation of why multi-threading is inherently worse on nVidia hardware and yes Asynch is the prime example for that but not the original point.

But the chart from your first post doesn't support that claim! Both, NVidia and AMD scale the same in DX12 in a multithreaded environment. You argue with your own links! Furthermore, Oxide definitely know what they're doing, but the game (and the engine) was funded by AMD. They optimized everything for AMD. I'm not saying this is bad, in fact it's great that they can show how far you can take async with real gains (the heavier techniques are unusable for non top-down view games, but it's important nonetheless). But what I am saying is that you cannot take such statements at face value.
When I'm coding for PC I will optimize for all vendors as much as I can without screwing the other too much. It includes Intel. What they probably meant is that their vision of their engine is better multithreaded for AMD rather than for NVidia.

I return to the statement of just because your GPU working at 100% does not mean all resources are utilized. GPUView is still controlled through the DirectX API and thus has it's limitations.

gpuview is not controlled by the DX api. The logger collects events straight from the kernel, gpuview is just a viewer for said events. In essence it bypasses DX. What next? Windows kernel interferes? Right, let's ship games with device drivers which utilize the gpu directly. Also btw, you will never ever see 100% in gpuview, best cases can be like 99.5% or something, which is akin to a good CPU utilization, you can never take over everything unless you're a device driver.

It partly does as software needs to know what CAN be HyperThreaded and not meaning there's a logic definition behind it. It's just a lot simpler than GPU parallelization as where the GPU has 2000 to 4000 "cores" the CPU has "only" up to 24 right now and a CPU is a serial device with cores being an attempt to parallelize it... barely.

How can you control HT other than not creating enough threads? There's literally no way to control HT other than not use it entirely. But we're deviating, HT has nothing to do with DX12, and they're not solving the same problem. HT automagically utilizes unused ports or pipelines with the other thread on the same core, the closest thing to that that you have in DX12 is async, with which you try (a huge emphasis on try) to utilize ALU in the compute queue while your bandwidth stuff is busy in the 3d queue. The resemblance between this and HT is incredibly far fetched.

The hilarious part about it was the fact you stated and apparently meant it when you said...
I'm sure it had nothing to do with having access to the "metal resources" of any GPU.

Explain then to me, what is the difference between that and removing the crappy driver from the equation?
On consoles everything is low level (relatively), which allows some nifty tricks (and oh-boy some tricks on PS3 are dirty) and low overhead. For instance, artists design some scene and it requires X draw calls. No problems on xbox 360. Then comes DX11 on Windows, you can't do X but you can do X/10, suck it up! How am I going to make this work on Windows? Multithreading DX11 helps (even assuming I'll refactor the whole god damn engine), but not enough because AMD is in the way. What am I supposed to do then? Lo and behold, with DX12/Mantle/Vulkan I can do X draw calls because low overhead. Problem solved. I bypassed the crappy driver so I can do the same amount of draw calls that I can on a console. On consoles there are also other stuff like what is called heaps in DX12, but this is almost irrelevant for PC development, except for low end.

Did you or did you not respond to the "Primitive Discard Accelerator" post? If so how does that magically transform subject?

I did, but now I know that you don't know, let me explain it in this way. Primitive discard accelerator on its own is useless. Useless. That's right, useless! Three times! Why? Because I can cull geometry more efficiently on the CPU per object, but if per object geometry passes then I'm just going to count on sorting, back-face culling and early z. I will sort the geometry anyway, so earlyz will always work wonders for me. Earlyz is in the hardware for a very very long time. It's nothing new and nothing to get excited about. Furthermore, for most objects the parts that you don't see (part of the object geometry) is going to be culled away because it's a back-facing triangle which is, again, done by hardware for decades.
Now what is left? Either a few triangles from normal concave geometry, which is entirely irrelevant in terms of performance. Or to break up tessellated objects. So essentially only tessellation is a contender. But there is a question that arises in this case. Why introduce a whole culler (probably complicated one) and not just improve the tessellator if they feel that this is a problem? The amount of tessellated objects in a normal scene is not high, on top of that all modern uses of tessellation include good lod tuning which will prevent overtessellation. So what's left? Best case when most of the tessellated object is obscured and you align the camera to see a small part of it, they save a few 100k domain shader invocations which will fail after rasterization anyway because of early z? Boo-f-hoo, big deal. Great job on solving that corner case! Really, a pebble among rocks.
So back to square one, what it will be used for? per-object culling? Better on CPU.

**Evildeffy** · 2016-06-22, 12:35 AM

Originally Posted by dadev

Ok factors, instead of that let's look at amount.
nVidia best case scenario Octacore 2,75 million draw calls a second under DX11.
AMD best case scenario Octacore 1,05 million draw calls a second under DX11.

Now let's go to DX12:
nVidia best case scenario Octacore 19,12 million draw calls a second under DX12, that is an increase of 695%.
AMD best case scenario Octacore 15,67 million draw calls a second under DX12, that is an increase of 1.492%

You're standing there willing to tell me simply by reducing overhead the draw call count can be increased by a factor of 7 and 15 respectively? (rounded up for ease)
Driver overhead reduction is not that large and if it were it would double to triple our FPS in games if we just hold the factor 7, not counting 15.
Also as is quoted in the very link I posted:

Originally Posted by PC Perspective

While this suggests that just a single graphics device is to be defined, which we also mentioned in the previous article, it also implies that one thread needs to be the authority. This limitation was known about for a while, and it contributed to the meme that consoles can squeeze all the performance they have, but PCs are “too high level” for that. Microsoft tried to combat this with “Deferred Contexts” in DirectX 11. This feature allows virtual, shadow states to be loaded from secondary threads, which can be appended to the global state, whole. It was a compromise between each thread being able to create its own commands, and the legacy decision to have a single, global state for the GPU.

Some developers experienced gains, while others lost a bit. It didn't live up to expectations.

This is something I said earlier, 1 primary thread and the rest is a slave. Is this information above incorrect by PC Perspective?

Originally Posted by dadev

Correct it is but when it was implemented WoW's engine was entirely overhauled, it wasn't just a patch in it was included into a new expansion, I don't remember if it was Cataclysm or MoP.
It's not a base tag on patch.

Originally Posted by dadev

There's more minor parts but yes those are the main performance increases, I'll ignore the draw call limitation for now since it's obvious we'll never agree to that part.

Originally Posted by dadev

Funny since I'm close to the same amount of years, 22 instead where I've seen those same APIs come and go with all sorts of promises.
Promising something is one thing, living up to them another. This of course also goes for DX11's Deferred Contexts.

Originally Posted by dadev

And here is id Software with DOOM saying otherwise, whom do I believe? You or them?

Originally Posted by dadev

General information available around the web like PC Perspective.
So then whom do I believe? The company who's a multi-billion dollar conglomerate or some random person on the web?
See my point with this? There's also development kits and MSDN available for devs whom understand the functions just like OpenGL.
Though OpenGL is more freely accessible of course, no denying that... but it's still a mess of epic proportions.

Originally Posted by dadev

That may be so but Blizzard aren't fools.
If you look at the original engine and what it is now it is clear it's not something like OpenGL but some parts have been completely re-written for it.
Honestly if they simply allow for the driver overhead reduction and draw calls to be allowed on other threads in WoW so many issues would be solved it's incredible.

Originally Posted by dadev

MMOs are another where this could be extremely advantageous but Oxide is not funded by AMD.
nVidia themselves have also stated that they aren't .. also nVidia was supposed to bring out an Asynch Compute enabled driver 9 months ago, we still don't have it.
Though saying their vision of multi-threading is better for AMD than nVidia is what they meant... no not really.
The dev himself said that nVidia has even spent more time and sent more devs to assist with their optimising than AMD did, is he lying?

Originally Posted by dadev

Originally Posted by MSDN

This shows you how to measure some of the most important performance time measurements for a DirectX app using the XPerf and GPUView tools that ship as part of the Windows Performance Toolkit. This is not a comprehensive guide for understanding the tools, rather their specific applicability for analyzing DirectX app performance.

Ok then so is MSDN lying? I'm going by what's described here, is this information correct or not?

Originally Posted by dadev

So then tell me... is the following incorrect?
More specifically right underneath the "Our Take, DirectX 12 Asynchronous Compute : What It Is And Why It Matters" headline in that post.

Originally Posted by dadev

There are multiple reasons to why, one being actually being able to fully utilise the hardware we buy on the PC.
Rather than just saying "Oh it was done because developers want to get rid of AMD's drivers only!" there are a plethora of features as to why.
Since you're a developer (by your own admission) how complex is DX11's multi-threading incorporation vs. DX12's multi-threading?
As far as I'm reading it's by order of magnitudes easier and far more effective as it's native rather than jumping through hoops and praying.
But there's another question that still stands: Are you telling me that JUST removing driver overhead you can gain a factor 7 to 15 Draw Calls?
I have my doubts on this order of magnitude.

Originally Posted by dadev

It is something that developers, most notably GameWorks developers, include in AAA-titles to tesselate and render objects unnecessarily in order to destroy competitor performance.
Crysis was 1 such culprit where (as said earlier) you'd see the shore and sea miles away whilst you're in the middle of the city but that sea would be rendered still underneath the city, impacting performance not by a small margin as you state but a great deal.
It will be used not for entire objects but for objects that do not need such vast amounts of resources, I will rephrase it because you and I understand "Object visibility culling" different from each other even though the word itself indicates function.
I will call it partial object visibility culling for you, is that more clear or do you prefer another term entirely?
The entire reason it was implemented (as said earlier, again) is to combat nVidia's tactics they use with game developers so on it's own it's not useless.
It would be if nVidia didn't resort to underhand tactics to sabotage their competitors.
So back to square one since we were talking about this technology and not something else, better on graphics as it's being rendered by the graphics card and it can better decide what to render and what to discard of said object, much like a sea that does not need to be rendered underneath a city draining performance when it's not required.
Doing this on CPU would require an Interrupt Request to validate and eliminate said objects, not counting the amount of bugs this might bring to an engine.

**Fascinate** · 2016-06-22, 01:40 AM

9 days, still no 1060 announcment looks like im goin to the red team.

**Remilia** · 2016-06-22, 04:55 AM

Since there's a gigantic wall. @dadev how do you know it's faster on the CPU than GPU exactly, since we've never actually had something like this. And plus, a 64x tessellated hair (looking at you witcher 3) is really dumb, so yes it'd be nice if we can remove some of that shit.

Recent Blue Posts

Feedback: Monk Updates

Feedback: Monk Updates

Feedback: Warlock Updates

Feedback: Warlock Updates

Feedback: Mage Updates

Feedback: Mage Updates

Feedback: Demon Hunter Updates

Recent Forum Posts

Do you consider the Horde to be "the bad guys" or is it more complex?

What game first sparked your interest in gaming? Was it World of Warcraft?

An Update on This Year’s BlizzCon and Blizzard’s 2024 Live Events

Best Villain in the History of WoW

The War Within Alpha Datamining - Build 54361 - Misc Changes

The War Within Alpha Datamining - Build 54361

The War Within Alpha Development Notes - April 25, 2024

Thread: Techspot: AMD Polaris 10 performance will reportedly be on par with Radeon 390/390X

Thread Tools

Posting Permissions

Social Media

Services

Resources

Our Communities

MOBAFire Network