Originally Posted by
dadev
I think there's a sizable gap between the expectation and what is achievable right now.
First, to get this out of the way, currently there's no mGPU support in Vulkan. Now this is a broad statement, and what it actually means is that there's no mechanism that allows direct communication between gpus.
It doesn't mean that you can't create work on 2 gpus in parallel, because you can. You can also transfer data through cpu memory, it's reasonable or crap depending on hardware and implementation. You still can't have shared semaphores, and this really sucks. Synchronizing through cpu introduces nontrivial latency, so unless one gpu does completely unrelated work (like one does normal graphics, the other does global illumination or physics) you can't really make it work well enough. In short, it'll hardly double your frame rate.
It is planned for the future though. As far as I know, Vulkan was basically rushed without mGPU support in order to not be too late to the market (like OpenGL usually was), this might turn out as a good decision.
In DX12 the situation is completely different. There're are not one, but two mGPU modes.
One is SLI/CF, but without the middle man. And in fact you need to have SLI/CF enabled in the respective control panel for this to work. Control panel alone is not enough of course, and you need a full mGPU support in the engine for it. You create a single interface and address gpus with masks. This is the performant mGPU mode, as you get to fully control all linked nodes and decide upon synchronizations between them. Getting x2 frame rate (or close to it) from this mode, with traditional AFR, is not difficult as long as you can keep the frame time on cpu low enough.
However, this mode is also subject to exactly the same limitations as SLI/CF simply because it requires it enabled. Just to note, this is almost nothing like SLI/CF in DX11 for which the driver had to do the actual implementation. In this case implementation is by the engine.
The other DX12 mGPU mode is the so called multi adapter. In this case, it's like Vulkan (you create multiple adapters and work with each gpu independently), however DX12 also allows sharing resources and fences (this is for synchronization, in case you don't know the terminology, like Vulkan's semaphore) between gpus. In essence this allows to transfer data and events between gpus without full intervention of Windows. Partial intervention is still there though, and this is what sets apart the "linked node" mode and the "multi adapter" mode on the performance side. Implementation by the engine is radically different, because in multi adapter you don't use masks, but each adapter gets a dedicated interface.
tldr. this is not 100% up to game engines. vulkan is not there yet. dx12 is, but there are "buts" if you deviate from the SLI/CF model.