Bringing this back up since you aren't the first person (@Marlamin ) to express this sentiment, and I wanted to illustrate the point:
Spoiler:
These were made with zero in-engine work or sandboxing or whatever, using stable diffusion with a LoRA trained for about 5 minutes on 24 WoW screenshots I pulled from google. I generated maybe 30 images before I had a decent and interesting set to use.
Bottom two are complete untouched from their generated form (besides cropping). Top left has a lamp dropped in but is otherwise unmodified. Top right has five existing models photoshopped in, as well as some minor fixes to clear up stuff that was more obviously strange looking in the generated picture. All those existing assets were just screenshotted from model viewer, so they're 2D pictures layered on top.
Entire thing, start (from finishing SD install and looking up how to train the LoRA) to finish was about an hour and a half).
This is why I suggested that those original four pictures could be AI bashed with assets. It's very easy to just generate what will look like a screenshot with AI and then spruce it up a bit if you know what you are doing.