Page 6 of 6 FirstFirst ...
4
5
6
  1. #101
    Quote Originally Posted by Noctifer616 View Post
    It wouldn't be any different today. The issue isn't with PhysX, it's with DirectX and OpenGL and their lack of Asynchronous Compute.
    Couldn't Nvidia have optimized PhysX over the 4 generations between the 280 GTX and the 780 GTX? 4 generations is a long time.

    Quote Originally Posted by Noctifer616 View Post
    Star Swarm doesn't have amazing graphics either, yet it manages to bottleneck modern CPUs.
    Just because one game manages a draw call bottleneck doesn't mean every other game is the same.

    Quote Originally Posted by Noctifer616 View Post
    Why do you think are the Blizzard devs working on reducing the visual clutter by 70% in WoD raids? Because if you run into a CPU bottleneck lowering the graphical settings isn't going to help one bit.
    Clearly these are draw call bottlenecks and nothing else?

  2. #102
    Quote Originally Posted by yurano View Post
    Couldn't Nvidia have optimized PhysX over the 4 generations between the 280 GTX and the 780 GTX? 4 generations is a long time.
    Sigh. It has nothing to do with PhysX code. Your graphics cards have graphics units and compute units. PhysX runs on the compute units. The problem is the Queue. In DirectX and OpenGL, you can either queue up graphics task or compute task, you can't queue both at the same time. That means that your graphics card does either compute, and graphics wait, or it does graphics and then compute waits.

    That's why Physx and TressFX effect your graphics performance so much. When you have a dedicated graphics card, one card does graphics, the other one does compute. This basically simulates Asynchronous Compute, but does it with two cards (one has the graphics queue, the other the compute queue) rather than one. DirectX 12 and Mantle achieve the same effect, but on a single cards, but there is a compute and a graphics queue, so there is no idle time when both tasks run at the same time.

    Quote Originally Posted by yurano View Post
    Just because one game manages a draw call bottleneck doesn't mean every other game is the same.
    Every game can run into it. Take the most optimized game there is, lower all the graphics setting and resolution, and watch how you run into a CPU bottleneck because the GPU want to push a massive amount of FPS and the CPU can't pick up with the pace. In this case, the number of drawcalls per frame is the same, but you have a massive number of frames in a second. That's how you avoid CPU bottlenecks with high end GPU's, you up the res, you up the graphical fidelity. That increases the work the GPU has to do, thus it pushes out less FPS, the drawcalls per frame will be the same, but less frames per second for the CPU to handle.

    Quote Originally Posted by yurano View Post
    Clearly these are draw call bottlenecks and nothing else?
    Of course there aren't. But if your GPU is limited by the CPU and the CPU is not at 100% or even close to, there is space to improve.

  3. #103
    Quote Originally Posted by Noctifer616 View Post
    Sigh. It has nothing to do with PhysX code. Your graphics cards have graphics units and compute units. PhysX runs on the compute units. The problem is the Queue. In DirectX and OpenGL, you can either queue up graphics task or compute task, you can't queue both at the same time. That means that your graphics card does either compute, and graphics wait, or it does graphics and then compute waits.
    That's a massive amount of misinformation. PhysX takes advantage of parallel processing on CUDA (shaders). Shaders are also integral to general visual performance. Further, your statement that GPUs have some sort of single lane queue is also completely false. You would have a point if we were discussing single core CPUs, but we're not. Your typical DX pipeline requests image data, then post processing, compute and a multitude of other stuff. Offloading "PhysX" compute to another card doesn't magically change that pipeline, and these poorly referenced "queues" that you're referencing would still exist regardless.
    i7-4770k - GTX 780 Ti - 16GB DDR3 Ripjaws - (2) HyperX 120s / Vertex 3 120
    ASRock Extreme3 - Sennheiser Momentums - Xonar DG - EVGA Supernova 650G - Corsair H80i

    build pics

  4. #104
    Quote Originally Posted by Zeara View Post
    That is such bullshit!
    iirc Nvida dropped prices on the 7xx series when AMD released the R9's. A 780 would have been 150-200(dont recall exactly but it was substantial) more than a 290. then mining happened....

    But you are correct you cant generalize that AMD is cheaper they trade blows. Varies by card and generation.

    @LummyBear Most of the current gen AMD has been 3-4x the nvidia prices because of mining. 290's @ $1000, 280x @ $650. Ridiculous. Thankfully AMD got that shit under control now and you can finds cards at their actual MSRP.
    Last edited by TaintedOne; 2014-06-08 at 04:14 AM.
    | Intel i5-4670k | Asus Z87-Pro | Xigmatek Dark Knight | Kingston HyperX Fury White 16GB | Sapphire R9 270x | Crucial MX300 750GB | WD 500GB Black | WD 1TB Blue | Cooler Master Haf-X | Corsair AX1200 | Dell 2412m | Ducky Shine 3 | Logitech G13 | Sennheiser HD598 | Mionix Naos 8200 |

  5. #105
    Quote Originally Posted by glo View Post
    That's a massive amount of misinformation. PhysX takes advantage of parallel processing on CUDA (shaders). Shaders are also integral to general visual performance.
    Yes I was wrong. Compute units and Graphics units are one and the same thing.

    Quote Originally Posted by glo View Post
    Further, your statement that GPUs have some sort of single lane queue is also completely false.
    I didn't say that. It's not possible to overlap compute and graphics task at the same time in OpenGL and DirectX. Works on consoles, the PS 4 uses it, doesn't work on PC.

    http://www.slideshare.net/DICEStudio...&from_search=1

    Slide 41 to 43.

    Quote from the OpenGL 5 candidate feature list:

    Good example of this is shadow map rendering. It is bound by fixed function hardware (ROPs and primitive engines) and uses very small amount of ALUs (simple vertex shader) and very small amount of bandwidth (compressed depth buffer output only, reads size optimized vertices that don't have UVs or tangents). This means that all TMUs and huge majority of the ALUs and bandwidth is just idling around while shadows get rendered. If you for example execute your compute shader based lighting simultaneously to shadow map rendering, you get it practically for free. Funny thing is that if this gets common, we will see games that are throttling more than Furmark, since the current GPU cooling designs just haven't been designed for constant near 100% GPU usage (all units doing productive work all the time).
    Last edited by Noctifer616; 2014-06-08 at 08:42 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •