https://www.youtube.com/watch?v=6laL-_hiAK0
Very interesting video from pcper concerning the problems with the windows task scheduler.
TL/DR (or better TL/DW):
A lot of people suspected, that the current version of the windows task scheduler does not understand Ryzen's SMT and in consequence assigns threads to suboptimal logical cores. E.g. Windows thinks that logical cores 0+1 are on physical core 0, logical cores 2+3 are on physical core 1 etc. but in reality logical core 0+5 would be on a physical core. That is not true.
Ryzen's logical cores are detected correctly by windows (0+1 on one core, 2+3 on once core, etc.), and if there are fewer threads than logical cores, windows will correctly try to put one thread per physical core first.
However, the Ryzen cores are divided into two core complexes (CCX): core 0-3 in one CCX and core 4-7 in the other. And what most likely harms the performance in certain games, is when two threads on different CCXs need to talk constantly to each other, because there is an increased latency in that scenario. Some numbers from the video:
Intel i7-7700k crosstalk latency:
On the same physical core: ~15ns
On different physical cores: ~76ns
Ryzen 1800X crosstalk latency:
On the same physical core: ~25ns
On different physical cores: ~42ns
On different core complexes: ~142ns
This could be the reason why Ryzen performs worse in DX12 than in DX11, because DX12 is apparently heavily multi-threaded an there is a lot of crosstalk between the threads.
All of this actually makes me excited for the 4 core variants (R3 and R5 up to the 1400x), because most likely those only consists of once CCX, so there will be no increased crosstalk latency (for unimpeded performance on all available cores).
Why do something simple, when there is a complicated way?
Ryzen 7 2700X | BeQuiet Dark Rock Pro 4 | 16GB DDR4-3200 | MSI X470 Gaming Pro | MSI GTX 1070 Gaming X 8G | 500GB / 750GB Crucial SSD
Fractal Define C | LG 32UK550 | Das Model S Professional Silent | CM Storm Xornet
@Biernot That's not the only problem when it comes to games, if for any reason Windows ends up moving something from a CCX to another one. Now you don't have the information available in the cache anymore and has to retrieve it from RAM again, and since RAM has a lot of problems currently Ryzen literally has its worst possible scenario in gaming loads.
Also, some games do the scheduling themselves and have not yet been updated to know that odd "cpus" are SMT in Ryzen and just assume the same as Intel (0-4, 1-5, 2-6, 3-7 for example). Which obviously might also cause problems.
This is bullcrap. Why would AMD release a CPU if no motherboards are in stock? I'm sick and tired of waiting for a X370 board to be in stock.
I didn't see if they linked the source to their test program, but intel's numbers are suspiciously similar to latencies of L3 access and memory access rather than what I would expect from normal cross core talk. L3 is shared on all cores for intel cpus, so the latency of transferring data which is in L3 between cores is the latency of L3. The 76ns latency would seem like it's coming from some cache thrashing. That number should be lower for normal cross core communication, as it is it falls in line with L3 miss, which causes ram read, then write to that L3 spot, then the L3 latency of reading from that spot from another core. If the memory was already in L3 that number should be ~20ns (for 4ghz), and if it's on the same core and in L2 then sub 10ns.
For details on DX12 runtime you would have to ask microsoft, but it's a relatively thin layer, I doubt it has any significant effect on performance unless you do ridiculous things or run with debugging layer enabled.This could be the reason why Ryzen performs worse in DX12 than in DX11, because DX12 is apparently heavily multi-threaded an there is a lot of crosstalk between the threads.
If you meant the actual engines using DX12 api, then they are multi-threaded, but there shouldn't be much cross-talk. Each thread records its own command lists which should've been preassigned. It can go wrong if you record small amount of draw calls per thread task (if there's even such a construct in the engine, if not then maybe per submit), then you might actually spend more time on synch. But this is somewhat abnormal engine design or just abnormal scene in the game. For instance, if you have 4 draw calls in total you shouldn't split them across 4 different threads. But it's somewhat ridiculous to expect such low draw count in a modern game.
Unless something was recently drastically changed, topology on intel cpus should be just the same. 0-1 first core, etc. Mine is snb, but I still checked with the topology mapper (https://software.intel.com/en-us/art...gy-enumeration) and it checks out.
Also, it would seem a somewhat limited L3 (rather than slow memory per se, although that may be a problem in its own right) which incurs a memory access when going across CCX, which means no L3 sharing. Ring bus is a beautiful thing and I'm happy to see it going strong after all these years
They did not link the program they tested that with. They explained that they did it with a self-written little script, that did little to no computation and just measured the latency when talking from one to another thread across the logical/physical cores.
This is where they explain, what exactly their script does: https://youtu.be/6laL-_hiAK0?t=478
Why do something simple, when there is a complicated way?
Ryzen 7 2700X | BeQuiet Dark Rock Pro 4 | 16GB DDR4-3200 | MSI X470 Gaming Pro | MSI GTX 1070 Gaming X 8G | 500GB / 750GB Crucial SSD
Fractal Define C | LG 32UK550 | Das Model S Professional Silent | CM Storm Xornet
this could be true
https://community.amd.com/community/...e?sf62109582=1
from AMD themselves:
so AMD says Windows is fineThread Scheduling
We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for “Zen,” and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture.
As an extension of this investigation, we have also reviewed topology logs generated by the Sysinternals Coreinfo utility. We have determined that an outdated version of the application was responsible for originating the incorrect topology data that has been widely reported in the media. Coreinfo v3.31 (or later) will produce the correct results.
Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes.
so the the gaming underperformance may be due to:
- latency penalties bettwen 2 diff 4-core CCXs (people are saying its the exact same thing on consoles CPUs (4+4) and the latencies are even worse there .. but they have shit performance anyway and I guess this matters less when your GPU cant even hold a steady 30 fps xD)
- memory something ? maybe related to first
- Intel compilers
SMT on/off is very negligible, all tests Ive seen show 0-3-5% difference with a few outliers (WH: TW does better w/o SMT, BF1 does better with SMT)
Last edited by Life-Binder; 2017-03-14 at 08:03 AM.
AMD's eight core CPUs seems to have the same performance as Intel's four core CPUs.
I have to say that I'm rather unimpressed by this Ryzen CPU.
Last edited by Amalaric; 2017-03-14 at 08:29 AM.
at equal clocks in synthetic/professional applications the Zen 8-cores trade blows with the $1000 6900K (also an 8-core)AMD's eight core CPUs seems to have the same performance as Intel's four core CPUs.
the weaker points are games and a 3.9+ (for 1700) or 4.0+ (for 1700X/1800X) all-core clock ceilings
I'm going to wait for the benches for the R5's.
Intel price-gouged so heavily over the last few years that what seems normal to some now is completely insane to me.
Paying £400 for a non-server chip was almost unheard of 6 years ago when I got my 2500k, and now it seems to be a bargain as far as most review sites are concerned?!
Once AMD reveal a sub-£200 cpu, i'll look into finally upgrading, but as of now it's still way too pricey a proposition.
Looks like I wasn't off when I said we could have some meaningful improvements in clocks manufacturing Zen in another process (Like Samsung's 10LPE or TSMC's 16FF+):
I wonder what will be manufactured there though, it isn't the entire Ryzen CPU just the Zen cores. So I suspect we're talking about some semi-custom design here like Scorpio's SoC.
Being honest I didn't even check it myself, I read someone mentioning that on Anandtech while discussing why AMD did it differently and I just assumed it had changed =p
Also some other random interesting test results from Iooncraz:
AMD's announcement regarding scheduling.Originally Posted by Iooncraz (Anandtech Forum)
And a post from Kromaatikse that I found interesting:
Originally Posted by Kromaatikse (AnandTech Forum)
Upgraded from an i7 4930K @ 4.5GHz to Ryzen 1700X, even at stock it feels smoother in games and is noticeably better at multi threaded applications.
If you think about it, its pretty sad single core IPC is only 11% ahead of my sandy lol. The cool part is AM4 socket is gonna be here for a dam long time, if they get a superclocker down the road with samsung or TSMC i can always sell off the 1700 and upgrade.
I have my case half built (psu/hdd installed) rest of the stuff arrives today
in games (Zen gaming IPC atm is right around Haswell level, give or take-ish)single core IPC is only 11% ahead of my sandy lol.
in prof apps its right up there with Broadwell/Broadwell-E and stock 6900K (6900K is still a better overclocker though)
for me personally a sockets longevity is near-useless, when I upgrade a CPU its always for at least ~4 years, possibly 5-6+ and obviously that is going to always include a new mobo too later and most of the time RAM as well (even if by some miracle the old socket still worked 4-5 years later I'd still upgrade mobo for the new features and the upcoming new socket anyway) .. thats why I shelled out for an i7 over i5 years ago and why I am now ignoring Zen1 and 7700K and waiting on the Coffee 6c or Zen 2 to last me into the next decade
my upgrade after the next one in 2022-2023 will hopefully include DDR5 RAM (or maybe even 3D XPoint though I doubt it)
Last edited by Life-Binder; 2017-03-14 at 01:45 PM.
It's also important to point out that benchmarkers always do their testing with extremely clean systems without anything at all running at the background which isn't really exactly the case with even a normal user. People usually have a browser opened, some YouTube video, maybe another program or two doing something while they play and the fact that Ryzen gives you twice the cores to put the "bloat" at makes it very believable that the gaming experience feels smoother.
This is a very specific example and not my only one, but the other night I was doing normal Nighthold, I had WoW on my centre monitor, YouTube on the left monitor playing a video (farm is dull), and discord/whatsapp clients on the right monitor (yes I have three, don't judge me). A very warforged trinket dropped that I couldn't trade due to ilevel, so unsure if it was an upgrade I opened Simcraft on my right monitor and simmed my char then re-simmed it twice with the new trinket in each slot. Neither the Youtube video or WoW experienced any slowdown.
Comically the first time I walked into Stormwind in 2004 I had a frame rate so bad it was like cycling through the screenshot folder XD