Hey look, we're back online!
Oh well, that sucked. About 48 hours ago MMO-Champion (and the entire Curse Network) went down without warning and the downtime lasted a ... little more than expected.
The site is currently in read-only mode
We're currently operating in read-only mode on a 8 days old database. This was the best way to bring back the site online tonight without interrupting the efforts to restore/fix everything. You will not be able to posts on forums and anything done in the past 5 days before the crash will not be visible for the moment
If everything goes as planned, tomorrow we will be able to restore everything to the state it was before the crash and the site will be reopened. (It might happen earlier, but let's say tomorrow)
Basically, everything went down when the Storage area network
decided to fail, there was no real way to prevent that and it instantly killed all the sites of the network. It's ok, we just used the backup controller that we had, I mean, we totally had that covered!
Then the backup controller failed. We worked with HP to get replacement parts as soon as possible and we had everything replaced within a couple of hours. The problem is, we still had a NAS that crashed pretty badly in our hands and we had to check for corrupted data ... for a very long time. The process literally took over 30 hours and all efforts made during this time to restore the site temporarily failed. We also had consultants on site and worked closely with HP and Microsoft to figure out if there was a way to speed up things without risking losing data without much success.
The NAS eventually came back online very early in the morning and a couple of sites were brought back online (including the blue tracker a little later)
Then, we got another hilarious problem, the faulty hardware crashed so badly that it bugged the firmware (yes, seriously) and our only hope was to get a bugfix from Hewlett Packard at this point. We're currently working with them and everything should be sorted out soon but it will take a couple of extra hours, that's why we decided to bring back the site in read only mode for the moment.
After the fix is applied, we will be able to restore the database to its pre-crash state and open the site to the public once again with the remaining news and forum posts. (You know, assuming everything goes as planned)
So, not a hack?
Nop. I know people freaked out a little but it really was a pure hardware failure, nothing got compromised.
But you could have avoided it!
With another setup, probably, but realistically it's pretty hard to predict all the technical failures you could have and damn, what are the chances that both controllers will fail within 2 hours of each other? For the record, the NAS was part of the brand new hardware bought a year ago to upgrade the entire infrastructure. Of course, lessons will be learned and I'm sure the lovely techy guys will work on ways to prevent that in the future, and it would be pretty unfair to bash them at this time because they pretty much worked 25 hours a day to fix the problem.
I like you guys.
I would like to apologize for the whole situation, realistically, a whole site exploding in the face of users for 2 days isn't something that should happen in a perfect world but hey, shit does happen. The community has been incredibly supportive throughout the whole thing and I was amazed by all the nice messages we got on Facebook. Really, I expected you guys to eat me alive and call me nasty names but you didn't. Thank you.
I hope we're still friends.
For the latest news posts, see the news below. If you have any question, poke me on @Boubouille_MMO - Twitter
and I will try to reply to everyone.