Before I begin, let me be clear that this post is NOT a critique of either of these sites. Their data is useful and has a purpose, but it is being misused by many to discuss class balance, which it fundamentally cannot be used for. Worldoflogs.com never claimed it could, and is intended as much for in-house examination of logs to optimize your own gameplay as comparing with others, and stateofdps.com takes the time to admit as much in their FAQ;
Given that stateofdps.com is a statistical site, they understand why their sampling methods mean their data is not a true average. Unfortunately, apparently a lot of readers DON'T read the FAQ or understand the math behind statistical theory. For that reason, I'm going to review a few principles of statistical theory, and examine the biases inherent to their sampling.Originally Posted by stateofdps.com FAQ
Fair warning, math ahead. But if you're not willing to learn the math, you really shouldn't be talking about average DPS numbers, which are math.
Let's begin with the basics. If you take a random sample for most kinds of things, you'll find that there tend to be more examples closer to the average than there are further outside. This works for grades and test scores, IQ scores, DPS numbers, etc. Once you control for external factors (and we'll come back to THAT point), the results tend to group up into a nice graph shape called a "bell curve". As underbogba pointed out in the thread, this is a generality and there are exceptions, where you end up with multiple peaks, but we're just handling basics for now rather than more advanced statistical theory. It looks like this;
The Mean Score is the average. The curve is highest there because there are more scores at/near the average than anywhere else.
The Standard Deviation is a statistical term for how wide the spread is. I'm not going to get into the details of the math, but you can see the percentage in the chart, for how much of the sample is within that standard deviation's distance; +/- one standard deviation from the mean is slightly more than 2/3 of the data. The important point for our purposes is that if a data set has less variation, if the numbers vary less from the mean, the standard deviation gets narrower which means that bell curve gets thinner and more pointy. If the numbers vary more, it will widen out. The area under the curve, the number of samples, remains the same, but the distribution changes. Another image to show what I mean is below;
The red line would be a typical distribution.
The blue line is a tighter, narrower distribution
The orange line is a wider distribution.
And the green line is a distribution with a lower average. The first three have the same average. That's important.
Now, to the important stuff. Stateofdps.com uses the top 200 parses on WoL, which are in theory some of the top players in the game. Out of the millions of players, this means their data is way, way to the right on the bell curve. Look at that second chart again. Look at where the lines are touching the 0.0 line, on their right hand sides. That's the Stateofdps.com numbers. See how the blue is significantly lower than the orange? The problem is that both distributions have the same average. By only selecting the top numbers, stateofdps.com isn't presenting you the average, they're presenting you numbers that are based at least as much on the variation within a distribution as anything else.
But Endus, I don't care about the scrubs who do bad, I only care about top performers!
Ahh, but that's not what you're getting. Skill is obviously a factor. I'm not saying those top parses aren't skilled players. But skill isn't the ONLY factor. And that's the problem.
The problem is those other factors. To review a few of them quickly (and I make no claims that these are the only ones, but they're fairly immediately obvious);
- Sheer random variation. See the second chart again. Different distributions push their top end further out, even if the averages are the same. In WoW, this comes down to how lucky you got with crits, proc timing, boss targeting causing you to move, etc. Lucky streaks give some classes significantly more benefit than others. I'll get into some hard sim numbers to wrap this up if you want specific examples.
- History. While it only works off the last 2 weeks, if there have been hotfixes or patches in that time, it skews the numbers on stateofdps.com, whether it was a nerf or a buff. They try and correct for that, but there's also a more subtle effect; classes that were underperforming and got buffed would have been more likely to be benched or at least passed over for gear in favor of those who were performing better at the time. This is more true of those guilds for whom performance is all, which are also those guilds most likely to be topping the WoL charts. As a result, lower performers, even if they've been buffed to have a competitive average now, will often still be behind the gear curve.
- Player bias. This one's tricky. There are buffs like Focus Magic and Dark Intent which will tend to go to the top DPS who can benefit from them. This has the effect of pushing their DPS even higher. Classes that benefit less from those buffs or whose DPS is slightly lower (or even just perceived to be lower for the reasons I'm detailing now) won't get those buffs, and this mean the gap appears larger than it actually is, on their own merits. Elemental Shaman, for instance, don't work as well with either crit boosts or periodic damage procs as some other classes, meaning we're usually not the first priority for either.
There are probably other factors that I'm not listing, but those are at least the ones that jump out at me.
And now, some numbers, to give you an idea of what the above means. For these numbers, I will be running Simulationcraft, version 4.0.6-18, the most recent as of this writing. I will be simming the results with 10,000 iterations, Patchwerk style fights, without Focus Magic or Dark Intent but all other raid buffs.
Elemental Shaman: 26,739 DPS, +/- 2100 (7.87%)
Enhancement Shaman: 26,858 DPS, +/- 2098 (7.81%)
Those are our current simmed BiS numbers. They're pretty close to each other. The important parts are to consider the average, and the variation; higher percentage variation means the class is more reliant on lucky streaks to top their DPS. It also means they can push higher when they GET those lucky streaks.
For example, one of the most random specs in the game, Fire mages;
26858+
Fire Mage: 25,650 DPS, +/- 3819 (14.89%)
Some important points to note; their average DPS is simming lower. The variation, on the other hand, is MUCH higher. Since we've removed gear and player bias and skill as factors, the ONLY thing affecting this is random variation. The variation with a Fire Mage is almost double that of a Shaman of either spec. If we were to only look at the top numbers for each out of those 10,000 variations, similar to how stateofdps.com works, we'd get this, instead;
Fire Mage: 29,469 DPS
Enhancement Shaman: 28,956 DPS
Elemental Shaman: 28,839 DPS
As you can see, by weighting our sample to only look at the top parses, Fire Mages move from ~ 1200 DPS below Shaman to ~500 DPS above Shaman. Based on nothing but random variation, and by ignoring all the parses where Fire Mages were just not getting their procs and their DPS was in the bucket as a result.
This is why you can't use stateofdps.com to show average DPS numbers across classes. Using something like simulationcraft is a MUCH stronger tool for comparison. Shaman of both specs are simming out at slightly under the average right now, but close enough that we're competitive. That's not to say we're perfect; Enhancement is scaling poorly and likely to fall behind in 4.2 if it's not addressed, and we're slightly under the average; we could use a small boost. Elemental also suffers more than most classes due to movement, though not nearly as badly as people seem to think nor as badly as we did in WotLK; you can check this by running simulationcraft with a "helter skelter" style fight instead of "patchwerk". The point of this post isn't to argue about Shaman DPS numbers or how I'm not factoring in movement, though, it is only to demonstrate the mathematical issues with relying on stateofdps.com data to discuss class balance. The numbers I'm quoting here are to prove a point in that regard, not to debate class balance in general.
On Sims vs. Logs
Since this point will inevitably come up at some point, a quick comment. The above factors, as well as human error, will ALWAYS be a factor to any kind of log analysis. A simulation, on the other hand, isn't affected by latency or lag, or human error, and can quickly punch up hundreds of hours of simulated combat. Logs have purposes; some classes do shine on certain fights, and you won't see that from a sim. But, in general, you only want to consider normal DPS and moving DPS for class balance comparisons, not "my class does amazingly well/badly on one gimmick fight". Using simulationcraft, you've got Patchwerk and Helter Skelter sims, for non movement and heavy movement, and the combination of the two should give you a good idea of how much movement is going to generally affect a given class.
Logs have purposes. They're the best way to improve your personal performance by checking uptimes and such. But they're not the best way to look at class balance. Sims are better for that purpose, since they eliminate the complicating factors the logs cannot, leaving class performance as the only remaining variable.
This post will be stickied, since I have had to post much the same in several threads and want a place to reference and link to. However, do NOT use this thread to discuss Shaman class balance. This thread is ONLY to be used to discuss the use of statistical methods of data comparison. I will simply delete anything that isn't relevant to that discussion, not because I am trying to censor the thread, but because I want to keep this sticky rigidly on a narrow topic. You're free to start a new topic and reference this thread, just don't do it here.
Again; this thread is NOT the place to discuss class balance or other such topics. Doing so will be treated, and infracted, as spam.