Page 1 of 13
1
2
3
11
... LastLast
  1. #1

    GTX 970 Users need to see this ASAP.

    Apparently there is a problem with GTX 970s in regards to the amount of usable VRAM. Gonna copy/paste from this reddit post.

    TL;DR : The last 0.5GB chunk of VRAM runs at a much slower speed and can cause stuttering when it is used, though it does not occur all the time.


    !UPDATE #3! Got a nice response from Nvidia this time. Gonna remove the old quote and keep it here and paste the new info.


    Also here is a very nice summary from AnandTech.
    Thanks to Saithes for posting it.



    A few secrets about GTX 970

    Yes, that last 0.5GB of memory on your GeForce GTX 970 does run slower than the first 3.5GB. More interesting than that fact is the reason why it does, and why the result is better than you might have otherwise expected. Last night we got a chance to talk with NVIDIA’s Senior VP of GPU Engineering, Jonah Alben on this specific concern and got a detailed explanation to why gamers are seeing what they are seeing along with new disclosures on the architecture of the GM204 version of Maxwell.

    For those looking for a little background, you should read over my story from this weekend that looks at NVIDIA's first response to the claims that the GeForce GTX 970 cards currently selling were only properly utilizing 3.5GB of the 4GB frame buffer. While it definitely helped answer some questions it raised plenty more which is whey we requested a talk with Alben, even on a Sunday.

    Believe it or not, every issue discussed in any forum about the GTX 970 memory issue is going to be explained by this diagram. Along the top you will see 13 enabled SMMs, each with 128 CUDA cores for the total of 1664 as expected. (Three grayed out SMMs represent those disabled from a full GM204 / GTX 980.) The most important part here is the memory system though, connected to the SMMs through a crossbar interface. That interface has 8 total ports to connect to collections of L2 cache and memory controllers, all of which are utilized in a GTX 980. With a GTX 970 though, only 7 of those ports are enabled, taking one of the combination L2 cache / ROP units along with it. However, the 32-bit memory controller segment remains.

    You should take two things away from that simple description. First, despite initial reviews and information from NVIDIA, the GTX 970 actually has fewer ROPs and less L2 cache than the GTX 980. NVIDIA says this was an error in the reviewer’s guide and a misunderstanding between the engineering team and the technical PR team on how the architecture itself functioned. That means the GTX 970 has 56 ROPs and 1792 KB of L2 cache compared to 64 ROPs and 2048 KB of L2 cache for the GTX 980. Before people complain about the ROP count difference as a performance bottleneck, keep in mind that the 13 SMMs in the GTX 970 can only output 52 pixels/clock and the seven segments of 8 ROPs each (56 total) can handle 56 pixels/clock. The SMMs are the bottleneck, not the ROPs.

    *To those wondering how peak bandwidth would remain at 224 GB/s despite the division of memory controllers on the GTX 970, Alben stated that it can reach that speed only when memory is being accessed in both pools.

    Second to that, it turns out the disabled SMMs have nothing to do with the performance issues experienced or the memory system complications.

    In a GTX 980, each block of L2 / ROPs directly communicate through a 32-bit portion of the GM204 memory interface and then to a 512MB section of on-board memory. When designing the GTX 970, NVIDIA used a new capability of Maxwell to implement the system in an improved fashion than would not have been possible with Kepler or previous architectures. Maxwell’s configurability allowed NVIDIA to disable a portion of the L2 cache and ROP units while using a “buddy interface” to continue to light up and use all of the memory controller segments. Now, the SMMs use a single L2 interface to communicate with both banks of DRAM (on the far right) which does create a new concern.

    A quick note about the GTX 980 here: it uses a 1KB memory access stride to walk across the memory bus from left to right, able to hit all 4GB in this capacity. But the GTX 970 and its altered design has to do things differently. If you walked across the memory interface in the exact same way, over the same 4GB capacity, the 7th crossbar port would tend to always get twice as many requests as the other port (because it has two memories attached). In the short term that could be ok due to queuing in the memory path. But in the long term if the 7th port is fully busy, and is getting twice as many requests as the other port, then the other six must be only half busy, to match with the 2:1 ratio. So the overall bandwidth would be roughly half of peak. This would cause dramatic underutilization and would prevent optimal performance and efficiency for the GPU.
    Let's be blunt here: access to the 0.5GB of memory, on its own and in a vacuum, would occur at 1/7th of the speed of the 3.5GB pool of memory.

    To avert this, NVIDIA divided the memory into two pools, a 3.5GB pool which maps to seven of the DRAMs and a 0.5GB pool which maps to the eighth DRAM. The larger, primary pool is given priority and is then accessed in the expected 1-2-3-4-5-6-7-1-2-3-4-5-6-7 pattern, with equal request rates on each crossbar port, so bandwidth is balanced and can be maximized. And since the vast majority of gaming situations occur well under the 3.5GB memory size this determination makes perfect sense. It is those instances where memory above 3.5GB needs to be accessed where things get more interesting.

    Let's be blunt here: access to the 0.5GB of memory, on its own and in a vacuum, would occur at 1/7th of the speed of the 3.5GB pool of memory. If you look at the Nai benchmarks floating around, this is what you are seeing.

    But the net result for gaming scenarios is much less dramatic than that, so why is that the case? It comes down to the way that memory is allocated by the operating system for applications and games. As memory is requested by a game, the operating system will allocate portions for it depending on many factors. These include the exact data space that the game asked for, what the OS has available and what the heuristic patterns of the software models deem at the time. Not all memory is accessed in the same way, even for PC games.

    If a game has allocated 3GB of graphics memory it might be using only 500MB of a regular basis with much of the rest only there for periodic, on-demand use. Things like compressed textures that are not as time sensitive as other material require much less bandwidth and can be moved around to other memory locations with less performance penalty. Not all allocated graphics memory is the same and innevitably there are large sections of this storage that is reserved but rarely used at any given point in time.

    All gaming systems today already have multiple pools of graphics memory – what exists on the GPU and what the system memory has to offer via the PCI Express bus. With the GTX 970 and its 3.5GB/0.5GB division, the OS now has three pools of memory to access and to utilize. Yes, the 0.5GB of memory in the second pool on the GTX 970 cards is slower than the 3.5GB of memory but it is at least 4x as fast as the memory speed available through PCI Express and system memory. The goal for NVIDIA then is that the operating system would utilize the 3.5GB of memory capacity first, then access the 0.5GB and then finally move to the system memory if necessary.

    The question then is, what is the real-world performance penalty of the GTX 970’s dual memory pool configuration? Though Alben didn’t have a specific number he wanted to discuss he encouraged us to continue doing our own testing to find cases where you can test games requesting less than 3.5GB of memory and then between 3.5GB and 4.0GB. By comparing the results on the GTX 980 and the GTX 970 in these specific scenarios you should be able to gauge the impact that the slower pool of memory has on the total memory configuration and gaming experience. The problem and risk is that this performance difference essentially depends on the heuristics of the OS and its ability to balance the pools effectively, putting data that needs to be used less frequently or in a less latency-dependent fashion in the 0.5GB portion.

    NVIDIA’s performance labs continue to work away at finding examples of this occurring and the consensus seems to be something in the 4-6% range. A GTX 970 without this memory pool division would run 4-6% faster than the GTX 970s selling today in high memory utilization scenarios. Obviously this is something we can’t accurately test though – we don’t have the ability to run a GTX 970 without a disabled L2/ROP cluster like NVIDIA can. All we can do is compare the difference in performance between a reference GTX 980 and a reference GTX 970 and measure the differences as best we can, and that is our goal for this week.

    Accessing that 500MB of memory on its own is slower. Accessing that 500MB as part of the 4GB total slows things down by 4-6%, at least according to NVIDIA. So now the difficult question: did NVIDIA lie to us?

    At the very least, the company did not fully disclose the missing L2 and ROP partition on the GTX 970, even if it was due to miscommunication internally. The question “should the GTX 970 be called a 3.5GB card?” is more of a philosophical debate. There is 4GB of physical memory on the card and you can definitely access all 4GB of when the game and operating system determine it is necessary. But 1/8th of that memory can only be accessed in a slower manner than the other 7/8th, even if that 1/8th is 4x faster than system memory over PCI Express. NVIDIA claims that the architecture is working exactly as intended and that with competent OS heuristics the performance difference should be negligible in real-world gaming scenarios.
    The performance of the GTX 970 is what the performance is. This information is incredibly interesting and warrants some debate, but at the end of the day, my recommendations for the GTX 970 really won’t change at all.

    The configurability of the Maxwell architecture allowed NVIDIA to make this choice. Had the GeForce GTX 970 been built on the Kepler architecture, the company would have had to disable the entire L2/MC block on the right hand side, resulting in a 192-bit memory bus and a 3GB frame buffer. GM204 allows NVIDIA to expand that to a 256-bit 3.5GB/0.5GB memory configuration and offers performance advantages, obviously.

    Alternatively to calling this a 4GB card, NVIDIA might have branded it as 3.5GB with the addition of 500MB of “cache” or “buffer” – something that designates its difference in implementation, its slower performance but also its advantages over not having it at all.

    Let’s be clear – the performance of the GTX 970 is what the performance is. This information is incredibly interesting and warrants some debate, but at the end of the day, my recommendations for the GTX 970 really won’t change at all. It still offers incredible performance for your dollar and is able to run at 4K in my experience and testing. Yes, there might in fact be specific instances where performance drops are more severe because of this memory hierarchy design, but I don’t think it changes the outlook for the card as a whole.

    Some other trailing notes. There should be no difference in performance or memory configuration results from one implementation of the GTX 970 to another. If your GTX 970 exhibits an issue (or does not) then your friends and his friends should match the whole way. The details about the memory issue also show us that a pending GeForce GTX 960 Ti, if it exists, will not necessarily have this complication. Imagine a GM204 GPU with a 192-bit memory bus, 3GB of GDDR5 and fewer enabled SMMs and you likely have a product you’ll see in 2015. (Interestingly, you have basically just described exactly the GTX 970M mobile variant.)

    This is not the first time that NVIDIA has used interesting memory techniques to adjust performance characteristics of a card. The GTX 550 Ti and the GTX 660 Ti both used unbalanced memory configurations, allowing a GPU with a 192-bit memory bus to access 2GB. This also required some specific balancing on NVIDIA's side to make sure that the 64-bit portion of that GPU's memory controller with double the memory of the other two didn't weigh memory throughput down in the 1.5 GB to 2.0 GB range. NVIDIA was succeeded there an the GTX 660 Ti was one of the company's most successful products of the generation.

    It would be interesting to see if future architectures that implement this kind of design should try to use drivers to better handle the heuristics of memory allocation. Surely NVIDIA’s driver should know better which assets could be placed in the slower pools of memory without affecting gaming performance better than Windows. I would imagine that this configurable architecture design will continue into the future and it’s possible it could be improved enough to allow NVIDIA to expand the pool sizes, improving efficiency even more and not affecting performance.

    For users that are attempting to measure the impact of this issue you should be aware that in some cases the software you are using report the in-use graphics memory could be wrong. Some applications are only aware of the first "pool" of memory and may only ever show up to 3.5GB in use for a game. Other applications, including MSI Afterburner as an example, do properly report total memory usage of up to 4GB. Because of the unique allocation of memory in the system, the OS and driver and monitoring application may not always be on the page. Many users, like bootski over at NeoGAF have done a job of compiling examples where the memory issue occurs, so look around for the right tools to use to test your own GTX 970. (Side note: we are going to try to do some of our own testing this afternoon.)

    NVIDIA has come clean; all that remains is the response from consumers to take hold. For those of you that read this and remain affronted by NVIDIA calling the GeForce GTX 970 a 4GB card without equivocation: I get it. But I also respectfully disagree. Should NVIDIA have been more upfront about the changes this GPU brought compared to the GTX 980? Absolutely and emphatically. But does this change the stance or position of the GTX 970 in the world of discrete PC graphics? I don’t think it does.

    OLD INFO

    !Update! : Might not be hardware-related after all so that is some good news.

    !UPDATE #2! Response from Nvidia.

    Nvidia Forums post & reply from moderator
    Last edited by tielknight; 2015-02-11 at 03:05 PM.
    If you must insist on using a non-sanctioned sitting apparatus, please consider the tensile strength
    of the materials present in the object in question in comparison to your own mass volumetric density.

    In other words, stop breaking shit with your fat ass.

  2. #2
    That response...... "we got your money gl&hf"

    Stay below 3.5Gb Vram I guess ^^

    I don't know what Nvidia is going to do with these cards now. I would have a hard time recommending this for a 'wow only' type builds cause its overkill and pricer than other cards what will do the job just fine. Can't recommend for those wanting full high/ultra on a wider array of games because of the stuttering issues when going over 3.5Gb of Vram and people wanting 4k gaming 970 is out for the same reason.

    People wanting to SLI 970's are certainly better of buying a 980 now instead no ?

    What kind of niche market is there for the 970 now ?
    Last edited by TaintedOne; 2015-01-24 at 08:59 PM.
    | Intel i5-4670k | Asus Z87-Pro | Xigmatek Dark Knight | Kingston HyperX Fury White 16GB | Sapphire R9 270x | Crucial MX300 750GB | WD 500GB Black | WD 1TB Blue | Cooler Master Haf-X | Corsair AX1200 | Dell 2412m | Ducky Shine 3 | Logitech G13 | Sennheiser HD598 | Mionix Naos 8200 |

  3. #3
    Scarab Lord Wries's Avatar
    10+ Year Old Account
    Join Date
    Jul 2009
    Location
    Stockholm, Sweden
    Posts
    4,127
    The issue is massively overblown. As a GTX 970 user you don't have to change a single thing about your usage routines. It worked well yesterday and does so today as well. That the last 0.5GB of memory isn't usable is also debatable and Nvidia claims that is not the case.

    This "issue" has been run into at extremely exotic settings and custom written benchmark runs. Then afterwards every single stutter in any game has been blamed on that without much of a ground.

    Just wait and see if Nvidia explains this further. As far as I'm aware this kind of memory segmenting has been present on a lot of cards before, where the bus doesn't quite match the memory setup. The 970 experience a relatively mild effect of what for example the GTX 550 and the 660 suffered from way more. (memory not entierly mapped to the memory bus, that is.)

    That's not to say Nvidia wasn't wrong in not straight up explaining this from the start, given the enthusiast segment is interested in knowing things like this.

  4. #4
    The Unstoppable Force Gaidax's Avatar
    10+ Year Old Account
    Join Date
    Sep 2013
    Location
    Israel
    Posts
    20,846
    Quote Originally Posted by Wries View Post
    The issue is massively overblown. As a GTX 970 user you don't have to change a single thing about your usage routines. It worked well yesterday and does so today as well. That the last 0.5GB of memory isn't usable is also debatable and Nvidia claims that is not the case.

    This "issue" has been run into at extremely exotic settings and custom written benchmark runs. Then afterwards every single stutter in any game has been blamed on that without much of a ground.

    Just wait and see if Nvidia explains this further. As far as I'm aware this kind of memory segmenting has been present on a lot of cards before, where the bus doesn't quite match the memory setup. The 970 experience a relatively mild effect of what for example the GTX 550 and the 660 suffered from way more. (memory not entierly mapped to the memory bus, that is.)

    That's not to say Nvidia wasn't wrong in not straight up explaining this from the start, given the enthusiast segment is interested in knowing things like this.
    While you are right, it is still quite an issue, plenty of people bought 2 of those for SLI in hopes of driving higher resolutions for a relatively acceptable price and having 500 MB memory less there is a big deal.

    - - - Updated - - -

    Quote Originally Posted by TaintedOne View Post
    What kind of niche market is there for the 970 now ?
    It is a more popular card than 980 as a matter of fact, it is much cheaper but only a bit less powerful. It is also very popular for SLI systems, you can have 2 of those for just a bit more than one 980.

  5. #5
    so 970 is a bad idea for 4k right?

  6. #6
    The Unstoppable Force Gaidax's Avatar
    10+ Year Old Account
    Join Date
    Sep 2013
    Location
    Israel
    Posts
    20,846
    Quote Originally Posted by Alantor View Post
    so 970 is a bad idea for 4k right?
    Any single card is a bad idea for 4k right now... unless of course you enjoy playing at 30FPS.

  7. #7
    I cant say I've had much problems running any games at ultra, though I only run 1080p so cant comment on higher resolutions. But it still kinda sucks, cause I was planning on getting another in the future for 4k.
    ||i5 3570k @ 4.4GHz||H100 push/pull||AsRock Z77 Extreme4||16Gb G.Skill Ripjaws 1600MHz||Gigabyte Windforce GTX 970|| Coolermaster Storm Trooper||Corsair TX850 Enthusiast Series||Samsung 840 Pro 128gb(boot drive)||1TB WD HDD, 2x 3TB WD HDD, 2TB WD HDD||

    Bdk Nagrand / Astae Nagrand
    Pokemon X FC: 4656-7679-2545/Trainer Name: Keno

  8. #8
    Scarab Lord Wries's Avatar
    10+ Year Old Account
    Join Date
    Jul 2009
    Location
    Stockholm, Sweden
    Posts
    4,127
    Quote Originally Posted by Gaidax View Post
    While you are right, it is still quite an issue, plenty of people bought 2 of those for SLI in hopes of driving higher resolutions for a relatively acceptable price and having 500 MB memory less there is a big deal.
    Thing is the last 500MB aren't unusable. Only few user cases have reported stuttering in certain titles and settings. Nothing is really saying that it isn't due to those settings they ran at were putting unusual load on the GPU and thus may be in need of a driver fix unrelated to the memory segmenting.

    If I was a 970 owner I'd just sit tight and wait until after the weekend to see some better articles and tests from tech-journalists investigating the issue, as well as more statements from Nvidia.

  9. #9
    Herald of the Titans Saithes's Avatar
    10+ Year Old Account
    Join Date
    Feb 2011
    Location
    Mun
    Posts
    2,719
    This actually cropped up when someone made a CUDA based benchmark to test the memory. Only some people actually see this issue and from what it seems on OCN people are mistaking frame drops for their 970 just not being able to provide decent fps with insane settings. Nvidia and AMD have also been doing asymmetrical memory configurations for years now (Geforce 8s, 9s, GTX 400s, 500s, 600s).

    According to Nvidia, that last 512MB is usable in games. It is also a small enough amount of memory that the lower memory bandwidth from the removed SM's on the last section that it shouldn't provide any noticeable degraded performance.


    On both of my GTX 970s I had before they both were able to go above 3.5GB which happened quite frequently in SoM, Skyrim, Arkham Origins and many other games without issues.

    Example: Ryse at 3900MB on 2 GTX 970s in SLI
    Last edited by Saithes; 2015-01-25 at 06:18 PM.
    Intel Core i7 5820K @ 4.2GHz | Asus X99 Deluxe Motherboard | 16GB Crucial DDR4 2133 | MSI GTX 980 4G GAMING | Corsair HX750 Gold | 500GB Samsung 840 EVO

  10. #10
    Quote Originally Posted by Saithes View Post
    This actually cropped up when someone made a CUDA based benchmark to test the memory. Only some people actually see this issue and from what it seems on OCN people are mistaking frame drops for their 970 just not being able to provide decent fps with insane settings. Nvidia and AMD have also been doing asymmetrical memory configurations for years now (Geforce 8s, 9s, GTX 400s, 500s, 600s).

    According to Nvidia, that last 512MB is usable in games. It is also a small enough amount of memory that the lower memory bandwidth from the removed SM's on the last section that it shouldn't provide any noticeable degraded performance.


    On both of my GTX 970s I had before they both were able to go above 3.5GB which happened quite frequently in SoM, Skyrim, Arkham Origins and many other games without issues.

    Example: Ryse at 3900MB on 2 GTX 970s in SLI
    The CUDA benchmark was written in response to the initial reports of frame-rate drops at high memory usage to test the bandwidth available to the memory modules, not the other way around.

  11. #11
    Deleted
    Quote Originally Posted by Gaidax View Post
    Any single card is a bad idea for 4k right now... unless of course you enjoy playing at 30FPS.
    So console style you say? Good enough right?

  12. #12
    Scarab Lord Wries's Avatar
    10+ Year Old Account
    Join Date
    Jul 2009
    Location
    Stockholm, Sweden
    Posts
    4,127
    Quote Originally Posted by Butler Log View Post
    The CUDA benchmark was written in response to the initial reports of frame-rate drops at high memory usage to test the bandwidth available to the memory modules, not the other way around.
    Nai has basically said his benchmark might actually test DDR3 by the end.

    A new claim (though sort of hinted in the statement from Nvidia) is that third party monitors of VRAM load can't properly monitor the last 0.5GB of memory (it might be in use), hence it reports 3.5GB until you excessively overload VRAM with so high settings that your card starts swapping to system RAM. An interesting theory, though it needs more explanation from Nvidia because I find it odd that in those overloaded scenarios eventually the VRAM monitoring actually shows 4GB used. Perhaps they should release a VRAM monitoring tool that can see both memory partitions and monitor them together.

    Should run a scenario with 970 and 980. Same game, same settings. The setting should be designed to let the 980 get above 3.5GB VRAM usage at all times. Then do the same run on 970 and see if it runs with similar performance but reports less than 3.5GB VRAM used.. Though keeping in mind VRAM could be used to cache less critical data at times so it's not really easy to conclude anything from that either..
    Last edited by Wries; 2015-01-26 at 01:53 PM.

  13. #13
    Quote Originally Posted by Radoleg View Post
    So console style you say? Good enough right?
    "My console is plugged into my 4k TV, it runs at 4k!"

  14. #14
    Updated the main post with the latest response + a video that explains the issue fairly well. As for what happens now, I guess that's up to how people decide to react.
    If you must insist on using a non-sanctioned sitting apparatus, please consider the tensile strength
    of the materials present in the object in question in comparison to your own mass volumetric density.

    In other words, stop breaking shit with your fat ass.

  15. #15
    I'd probably just suggest that most people aiming to buy a single 970 aren't planning to game at 4K and this is for the vast majority, a non-issue. It's already been benchmarked against all the games on the market and it performs well in all of them. It does ridiculously well in SLI at 4K, that hasn't changed. Fact of the matter is that a lot of games aren't going to have issues with the 3.5GB VRAM even at 4K (in runnable settings).

    Disregarding 4K though, given how ridiculously niche it is, the card is still great for 1080p/1440p gamers. I don't really understand the concern overall myself.

  16. #16
    The concern is that people were told that a card had so-and-so specifications and would perform as advertised and, while that is mostly correct, the thing that people are mad about is that the card really only has ~3.5GB of the VRAM that actually runs at the advertised speeds while the other ~0.5GB runs significantly slower, enough so that it can cause pretty sever framerate issues. The card itself is still a damn good card don't get me wrong, but it was touted to have a certain amount of memory that runs at (roughly) a certain speed and that was found out to be false.
    If you must insist on using a non-sanctioned sitting apparatus, please consider the tensile strength
    of the materials present in the object in question in comparison to your own mass volumetric density.

    In other words, stop breaking shit with your fat ass.

  17. #17
    Quote Originally Posted by tielknight View Post
    The concern is that people were told that a card had so-and-so specifications and would perform as advertised and, while that is mostly correct, the thing that people are mad about is that the card really only has ~3.5GB of the VRAM that actually runs at the advertised speeds while the other ~0.5GB runs significantly slower, enough so that it can cause pretty sever framerate issues. The card itself is still a damn good card don't get me wrong, but it was touted to have a certain amount of memory that runs at (roughly) a certain speed and that was found out to be false.
    True, but again, it's been overblown and the benchmarks aren't going to retcon themselves because of this discovery. Also, the difference is in the 1~3% FPS lost margin:



    (From PCGamer)

  18. #18
    1-3%? I think you might be looking at something wrong...
    If you must insist on using a non-sanctioned sitting apparatus, please consider the tensile strength
    of the materials present in the object in question in comparison to your own mass volumetric density.

    In other words, stop breaking shit with your fat ass.

  19. #19
    Quote Originally Posted by tielknight View Post
    1-3%? I think you might be looking at something wrong...
    1~3% in comparison to the 980's drop with the same settings. The 970 will naturally be slower overall in FPS.

  20. #20
    Deleted
    Not a huge deal. It still gives most graphical power for the bucks invested.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •