Page 52 of 95 FirstFirst ...
2
42
50
51
52
53
54
62
... LastLast
  1. #1021

  2. #1022
    Old God Vash The Stampede's Avatar
    10+ Year Old Account
    Join Date
    Sep 2010
    Location
    Better part of NJ
    Posts
    10,939
    Quote Originally Posted by Svinoi Banana View Post
    We all know that people in the future are going to use AMD.




  3. #1023
    Brewmaster Biernot's Avatar
    15+ Year Old Account
    Join Date
    Mar 2009
    Location
    Germany
    Posts
    1,431
    https://www.youtube.com/watch?v=6laL-_hiAK0

    Very interesting video from pcper concerning the problems with the windows task scheduler.

    TL/DR (or better TL/DW):

    A lot of people suspected, that the current version of the windows task scheduler does not understand Ryzen's SMT and in consequence assigns threads to suboptimal logical cores. E.g. Windows thinks that logical cores 0+1 are on physical core 0, logical cores 2+3 are on physical core 1 etc. but in reality logical core 0+5 would be on a physical core. That is not true.
    Ryzen's logical cores are detected correctly by windows (0+1 on one core, 2+3 on once core, etc.), and if there are fewer threads than logical cores, windows will correctly try to put one thread per physical core first.

    However, the Ryzen cores are divided into two core complexes (CCX): core 0-3 in one CCX and core 4-7 in the other. And what most likely harms the performance in certain games, is when two threads on different CCXs need to talk constantly to each other, because there is an increased latency in that scenario. Some numbers from the video:

    Intel i7-7700k crosstalk latency:
    On the same physical core: ~15ns
    On different physical cores: ~76ns

    Ryzen 1800X crosstalk latency:
    On the same physical core: ~25ns
    On different physical cores: ~42ns
    On different core complexes: ~142ns

    This could be the reason why Ryzen performs worse in DX12 than in DX11, because DX12 is apparently heavily multi-threaded an there is a lot of crosstalk between the threads.


    All of this actually makes me excited for the 4 core variants (R3 and R5 up to the 1400x), because most likely those only consists of once CCX, so there will be no increased crosstalk latency (for unimpeded performance on all available cores).
    Why do something simple, when there is a complicated way?
    Ryzen 7 2700X | BeQuiet Dark Rock Pro 4 | 16GB DDR4-3200 | MSI X470 Gaming Pro | MSI GTX 1070 Gaming X 8G | 500GB / 750GB Crucial SSD
    Fractal Define C | LG 32UK550 | Das Model S Professional Silent | CM Storm Xornet

  4. #1024
    The Lightbringer Artorius's Avatar
    10+ Year Old Account
    Join Date
    Dec 2012
    Location
    Natal, Brazil
    Posts
    3,781
    @Biernot That's not the only problem when it comes to games, if for any reason Windows ends up moving something from a CCX to another one. Now you don't have the information available in the cache anymore and has to retrieve it from RAM again, and since RAM has a lot of problems currently Ryzen literally has its worst possible scenario in gaming loads.

    Also, some games do the scheduling themselves and have not yet been updated to know that odd "cpus" are SMT in Ryzen and just assume the same as Intel (0-4, 1-5, 2-6, 3-7 for example). Which obviously might also cause problems.
    Last edited by Artorius; 2017-03-13 at 07:17 PM.

  5. #1025
    Brewmaster
    7+ Year Old Account
    Join Date
    Mar 2015
    Location
    Birmingham, Alabama
    Posts
    1,297
    This is bullcrap. Why would AMD release a CPU if no motherboards are in stock? I'm sick and tired of waiting for a X370 board to be in stock.

  6. #1026
    Quote Originally Posted by Biernot View Post
    Intel i7-7700k crosstalk latency:
    On the same physical core: ~15ns
    On different physical cores: ~76ns

    Ryzen 1800X crosstalk latency:
    On the same physical core: ~25ns
    On different physical cores: ~42ns
    On different core complexes: ~142ns
    I didn't see if they linked the source to their test program, but intel's numbers are suspiciously similar to latencies of L3 access and memory access rather than what I would expect from normal cross core talk. L3 is shared on all cores for intel cpus, so the latency of transferring data which is in L3 between cores is the latency of L3. The 76ns latency would seem like it's coming from some cache thrashing. That number should be lower for normal cross core communication, as it is it falls in line with L3 miss, which causes ram read, then write to that L3 spot, then the L3 latency of reading from that spot from another core. If the memory was already in L3 that number should be ~20ns (for 4ghz), and if it's on the same core and in L2 then sub 10ns.

    This could be the reason why Ryzen performs worse in DX12 than in DX11, because DX12 is apparently heavily multi-threaded an there is a lot of crosstalk between the threads.
    For details on DX12 runtime you would have to ask microsoft, but it's a relatively thin layer, I doubt it has any significant effect on performance unless you do ridiculous things or run with debugging layer enabled.
    If you meant the actual engines using DX12 api, then they are multi-threaded, but there shouldn't be much cross-talk. Each thread records its own command lists which should've been preassigned. It can go wrong if you record small amount of draw calls per thread task (if there's even such a construct in the engine, if not then maybe per submit), then you might actually spend more time on synch. But this is somewhat abnormal engine design or just abnormal scene in the game. For instance, if you have 4 draw calls in total you shouldn't split them across 4 different threads. But it's somewhat ridiculous to expect such low draw count in a modern game.

    Quote Originally Posted by Artorius View Post
    @Biernot That's not the only problem when it comes to games, if for any reason Windows ends up moving something from a CCX to another one. Now you don't have the information available in the cache anymore and has to retrieve it from RAM again, and since RAM has a lot of problems currently Ryzen literally has its worst possible scenario in gaming loads.

    Also, some games do the scheduling themselves and have not yet been updated to know that odd "cpus" are SMT in Ryzen and just assume the same as Intel (0-4, 1-5, 2-6, 3-7 for example). Which obviously might also cause problems.
    Unless something was recently drastically changed, topology on intel cpus should be just the same. 0-1 first core, etc. Mine is snb, but I still checked with the topology mapper (https://software.intel.com/en-us/art...gy-enumeration) and it checks out.
    Also, it would seem a somewhat limited L3 (rather than slow memory per se, although that may be a problem in its own right) which incurs a memory access when going across CCX, which means no L3 sharing. Ring bus is a beautiful thing and I'm happy to see it going strong after all these years

  7. #1027
    Brewmaster Biernot's Avatar
    15+ Year Old Account
    Join Date
    Mar 2009
    Location
    Germany
    Posts
    1,431
    Quote Originally Posted by dadev View Post
    I didn't see if they linked the source to their test program, but intel's numbers are suspiciously similar to latencies of L3 access and memory access rather than what I would expect from normal cross core talk. L3 is shared on all cores for intel cpus, so the latency of transferring data which is in L3 between cores is the latency of L3. The 76ns latency would seem like it's coming from some cache thrashing. That number should be lower for normal cross core communication, as it is it falls in line with L3 miss, which causes ram read, then write to that L3 spot, then the L3 latency of reading from that spot from another core. If the memory was already in L3 that number should be ~20ns (for 4ghz), and if it's on the same core and in L2 then sub 10ns.
    They did not link the program they tested that with. They explained that they did it with a self-written little script, that did little to no computation and just measured the latency when talking from one to another thread across the logical/physical cores.

    This is where they explain, what exactly their script does: https://youtu.be/6laL-_hiAK0?t=478
    Why do something simple, when there is a complicated way?
    Ryzen 7 2700X | BeQuiet Dark Rock Pro 4 | 16GB DDR4-3200 | MSI X470 Gaming Pro | MSI GTX 1070 Gaming X 8G | 500GB / 750GB Crucial SSD
    Fractal Define C | LG 32UK550 | Das Model S Professional Silent | CM Storm Xornet

  8. #1028
    Quote Originally Posted by Biernot View Post
    https://www.youtube.com/watch?v=6laL-_hiAK0

    Very interesting video from pcper concerning the problems with the windows task scheduler.

    TL/DR (or better TL/DW):

    A lot of people suspected, that the current version of the windows task scheduler does not understand Ryzen's SMT and in consequence assigns threads to suboptimal logical cores. E.g. Windows thinks that logical cores 0+1 are on physical core 0, logical cores 2+3 are on physical core 1 etc. but in reality logical core 0+5 would be on a physical core. That is not true.
    Ryzen's logical cores are detected correctly by windows (0+1 on one core, 2+3 on once core, etc.), and if there are fewer threads than logical cores, windows will correctly try to put one thread per physical core first.

    However, the Ryzen cores are divided into two core complexes (CCX): core 0-3 in one CCX and core 4-7 in the other. And what most likely harms the performance in certain games, is when two threads on different CCXs need to talk constantly to each other, because there is an increased latency in that scenario. Some numbers from the video:

    Intel i7-7700k crosstalk latency:
    On the same physical core: ~15ns
    On different physical cores: ~76ns

    Ryzen 1800X crosstalk latency:
    On the same physical core: ~25ns
    On different physical cores: ~42ns
    On different core complexes: ~142ns

    This could be the reason why Ryzen performs worse in DX12 than in DX11, because DX12 is apparently heavily multi-threaded an there is a lot of crosstalk between the threads.


    All of this actually makes me excited for the 4 core variants (R3 and R5 up to the 1400x), because most likely those only consists of once CCX, so there will be no increased crosstalk latency (for unimpeded performance on all available cores).
    this could be true

    https://community.amd.com/community/...e?sf62109582=1
    from AMD themselves:
    Thread Scheduling

    We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for “Zen,” and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture.

    As an extension of this investigation, we have also reviewed topology logs generated by the Sysinternals Coreinfo utility. We have determined that an outdated version of the application was responsible for originating the incorrect topology data that has been widely reported in the media. Coreinfo v3.31 (or later) will produce the correct results.

    Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes.
    so AMD says Windows is fine


    so the the gaming underperformance may be due to:
    - latency penalties bettwen 2 diff 4-core CCXs (people are saying its the exact same thing on consoles CPUs (4+4) and the latencies are even worse there .. but they have shit performance anyway and I guess this matters less when your GPU cant even hold a steady 30 fps xD)
    - memory something ? maybe related to first
    - Intel compilers

    SMT on/off is very negligible, all tests Ive seen show 0-3-5% difference with a few outliers (WH: TW does better w/o SMT, BF1 does better with SMT)
    Last edited by Life-Binder; 2017-03-14 at 08:03 AM.

  9. #1029
    AMD's eight core CPUs seems to have the same performance as Intel's four core CPUs.

    I have to say that I'm rather unimpressed by this Ryzen CPU.
    Last edited by Amalaric; 2017-03-14 at 08:29 AM.

  10. #1030
    Quote Originally Posted by Amalaric View Post
    AMD's eight core CPUs seems to have the same performance as Intel's four core CPUs.

    I have to say that I'm rather unimpressed by this Ryzen CPU.
    Why would you compare the 2? You compare the 8 cores to Intel 8 cores, and you will see that they are comparable performance and 1/2 to 1/3 the cost.

  11. #1031
    AMD's eight core CPUs seems to have the same performance as Intel's four core CPUs.
    at equal clocks in synthetic/professional applications the Zen 8-cores trade blows with the $1000 6900K (also an 8-core)

    the weaker points are games and a 3.9+ (for 1700) or 4.0+ (for 1700X/1800X) all-core clock ceilings

  12. #1032
    Quote Originally Posted by Life-Binder View Post
    at equal clocks in synthetic/professional applications the Zen 8-cores trade blows with the $1000 6900K (also an 8-core)

    the weaker points are games and a 3.9+ (for 1700) or 4.0+ (for 1700X/1800X) all-core clock ceilings
    $1000 for the 6900K? And here I thought that my 3770K was expensive when I bought it some years ago.

    - - - Updated - - -

    Quote Originally Posted by Gorgodeus View Post
    Why would you compare the 2? You compare the 8 cores to Intel 8 cores, and you will see that they are comparable performance and 1/2 to 1/3 the cost.
    I just thought that 8-cores would bring much more performance gain than it did.

  13. #1033
    Deleted
    I'm going to wait for the benches for the R5's.

    Intel price-gouged so heavily over the last few years that what seems normal to some now is completely insane to me.
    Paying £400 for a non-server chip was almost unheard of 6 years ago when I got my 2500k, and now it seems to be a bargain as far as most review sites are concerned?!

    Once AMD reveal a sub-£200 cpu, i'll look into finally upgrading, but as of now it's still way too pricey a proposition.

  14. #1034
    The Lightbringer Artorius's Avatar
    10+ Year Old Account
    Join Date
    Dec 2012
    Location
    Natal, Brazil
    Posts
    3,781
    Looks like I wasn't off when I said we could have some meaningful improvements in clocks manufacturing Zen in another process (Like Samsung's 10LPE or TSMC's 16FF+):



    I wonder what will be manufactured there though, it isn't the entire Ryzen CPU just the Zen cores. So I suspect we're talking about some semi-custom design here like Scorpio's SoC.
    Quote Originally Posted by dadev View Post
    Unless something was recently drastically changed, topology on intel cpus should be just the same. 0-1 first core, etc. Mine is snb, but I still checked with the topology mapper (https://software.intel.com/en-us/art...gy-enumeration) and it checks out.
    Also, it would seem a somewhat limited L3 (rather than slow memory per se, although that may be a problem in its own right) which incurs a memory access when going across CCX, which means no L3 sharing. Ring bus is a beautiful thing and I'm happy to see it going strong after all these years
    Being honest I didn't even check it myself, I read someone mentioning that on Anandtech while discussing why AMD did it differently and I just assumed it had changed =p

    Also some other random interesting test results from Iooncraz:
    Quote Originally Posted by Iooncraz (Anandtech Forum)
    BF4 Windows 7, Ryzen 1700X Stock, R9 Fury 1050MHz:


    BF4 Windows 10, Ryzen 1700X Stock, R9 Fury 1050Mhz:


    After a long couple of days of testing, I can make the following statements with extreme confidence:

    Relative Performance:

    Ryzen has 11% higher IPC than Sandy Bridge
    Ryzen has a whopping 28% higher multi-threaded performance per clock than Sandy Bridge
    Ryzen has 52.5% higher IPC than Excavator
    Ryzen has a gargantuan 82.05% higher multi-threaded performance per clock than Excavator

    Memory Sensitivity:

    Ryzen has some memory latency issues... but they only rarely impact application performance.
    Multi-threading performance is most sensitive to memory frequency
    Cinebench is actually memory sensitive on Ryzen!!! Not much, but it's there!

    Stability:

    In my days of testing, I've not had one application crash.
    Ryzen seems to have built-in safe-guards that may be hiding true over-clocking potential.

    Curios:

    Ryzen employs a self-learning and correction system... it makes the system seem like it has entered an endless boot loop. It follows an exacting pattern: Five full power cycles, two warm reboots, a partial boot, then a normal boot.
    Memory compatibility issues seem to be almost completely related to not being able to select 2T command rate.
    Performance seems to have positive scaling with frequency in some scenarios - mean 5% clock speed brings 7% performance increase. I am trying to track down if this is an aberration or if this is due to time-based latencies.

    I will be testing clock scaling tomorrow as well as verifying a couple of these numbers. I will also work on getting my results online here.
    AMD's announcement regarding scheduling.

    And a post from Kromaatikse that I found interesting:

    Quote Originally Posted by Kromaatikse (AnandTech Forum)
    I was just reminded of the big difference between Windows' process scheduler and a sane one, which fully explains why it migrates threads so often.

    For reference, this easily findable book chapter explains how several different types of multiprocessor scheduler work. Pay particular attention to the "work stealing" balancing algorithm; it runs on an idle or lightly-loaded CPU, and looks for CPUs with greater load than itself. An alternative approach is for a heavily-loaded CPU to look for CPUs with *less* load than itself, in order to *give* them some of its excess work - this works better in cases where idle CPUs are not periodically woken (which is more power efficient).

    Whichever approach Windows uses, it constantly attempts to move threads to less-loaded CPUs - even when it is the *only* runnable thread on its original CPU - and it counts the thread's own past load against its current CPU. This is inhibited only by the parking and affinity masks (which are clearly bolted-on afterthoughts), and makes no allowance whatsoever for SMT, NUMA, cache affinity, or the cost of context switches. The book chapter I linked doesn't mention SMT or NUMA (it may be a relatively old book, in which those concepts were not yet widespread), but it *does* talk about the other two factors as being key for efficiency.

    This *should* be very easy for Microsoft to fix, if they can be bothered. Simply make any thread meeting all of the following criteria ineligible for migration:
    It is the only thread currently in its CPU's run queue.
    It currently satisfies its own affinity mask, if any.
    Its CPU is not parked.
    It shares the same LLC as all other threads in the same process.

    This would make the precise behaviour of the core-parking algorithm much less important for enforcing short-term performance and efficiency goals. A useful additional parameter to the latter would then be an optimisation target, taking the following values:

    Execution resources - the current behaviour, preferentially unparking just one thread per physical core.
    Cache affinity - as above, but only within each LLC block. When all cores are unparked in one LLC, begin on the next.
    Power efficiency - always unpark all virtual cores in the same physical core before proceeding to another physical core. Also unpark all physical cores in one LLC before proceeding to the next.

    Well, we can dream.
    Last edited by Artorius; 2017-03-14 at 01:02 PM.

  15. #1035
    Elemental Lord
    15+ Year Old Account
    Join Date
    Mar 2009
    Location
    Wales, UK
    Posts
    8,527
    Upgraded from an i7 4930K @ 4.5GHz to Ryzen 1700X, even at stock it feels smoother in games and is noticeably better at multi threaded applications.

  16. #1036
    If you think about it, its pretty sad single core IPC is only 11% ahead of my sandy lol. The cool part is AM4 socket is gonna be here for a dam long time, if they get a superclocker down the road with samsung or TSMC i can always sell off the 1700 and upgrade.

    I have my case half built (psu/hdd installed) rest of the stuff arrives today

  17. #1037
    single core IPC is only 11% ahead of my sandy lol.
    in games (Zen gaming IPC atm is right around Haswell level, give or take-ish)

    in prof apps its right up there with Broadwell/Broadwell-E and stock 6900K (6900K is still a better overclocker though)



    for me personally a sockets longevity is near-useless, when I upgrade a CPU its always for at least ~4 years, possibly 5-6+ and obviously that is going to always include a new mobo too later and most of the time RAM as well (even if by some miracle the old socket still worked 4-5 years later I'd still upgrade mobo for the new features and the upcoming new socket anyway) .. thats why I shelled out for an i7 over i5 years ago and why I am now ignoring Zen1 and 7700K and waiting on the Coffee 6c or Zen 2 to last me into the next decade


    my upgrade after the next one in 2022-2023 will hopefully include DDR5 RAM (or maybe even 3D XPoint though I doubt it)
    Last edited by Life-Binder; 2017-03-14 at 01:45 PM.

  18. #1038
    The Lightbringer Artorius's Avatar
    10+ Year Old Account
    Join Date
    Dec 2012
    Location
    Natal, Brazil
    Posts
    3,781
    Quote Originally Posted by caervek View Post
    Upgraded from an i7 4930K @ 4.5GHz to Ryzen 1700X, even at stock it feels smoother in games and is noticeably better at multi threaded applications.
    It's also important to point out that benchmarkers always do their testing with extremely clean systems without anything at all running at the background which isn't really exactly the case with even a normal user. People usually have a browser opened, some YouTube video, maybe another program or two doing something while they play and the fact that Ryzen gives you twice the cores to put the "bloat" at makes it very believable that the gaming experience feels smoother.

  19. #1039
    Quote Originally Posted by Life-Binder View Post
    in games (Zen gaming IPC atm is right around Haswell level, give or take-ish)

    in prof apps its right up there with Broadwell/Broadwell-E and stock 6900K (6900K is still a better overclocker though)



    for me personally a sockets longevity is near-useless, when I upgrade a CPU its always for at least ~4 years, possibly 5-6+ and obviously that is going to always include a new mobo too later and most of the time RAM as well (even if by some miracle the old socket still worked 4-5 years later I'd still upgrade mobo for the new features and the upcoming new socket anyway) .. thats why I shelled out for an i7 over i5 years ago and why I am now ignoring Zen1 and 7700K and waiting on the Coffee 6c or Zen 2 to last me into the next decade


    my upgrade after the next one in 2022-2023 will hopefully include DDR5 RAM (or maybe even 3D XPoint though I doubt it)
    Ideally i woulda waited for zen 2 but i had the money to build now. And usually ya, sockets are useless but if you look how long AM3 was around i may be upgrading just a CPU down the line for the first time

  20. #1040
    Elemental Lord
    15+ Year Old Account
    Join Date
    Mar 2009
    Location
    Wales, UK
    Posts
    8,527
    Quote Originally Posted by Artorius View Post
    It's also important to point out that benchmarkers always do their testing with extremely clean systems without anything at all running at the background which isn't really exactly the case with even a normal user. People usually have a browser opened, some YouTube video, maybe another program or two doing something while they play and the fact that Ryzen gives you twice the cores to put the "bloat" at makes it very believable that the gaming experience feels smoother.
    This is a very specific example and not my only one, but the other night I was doing normal Nighthold, I had WoW on my centre monitor, YouTube on the left monitor playing a video (farm is dull), and discord/whatsapp clients on the right monitor (yes I have three, don't judge me). A very warforged trinket dropped that I couldn't trade due to ilevel, so unsure if it was an upgrade I opened Simcraft on my right monitor and simmed my char then re-simmed it twice with the new trinket in each slot. Neither the Youtube video or WoW experienced any slowdown.

    Comically the first time I walked into Stormwind in 2004 I had a frame rate so bad it was like cycling through the screenshot folder XD

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •