The Spin Cycle

Flickspin EC2 Instance Crash Forensics

EC2 | November 18, 2008

As some of you may have noticed Flickspin had a hiccup on the weekend but all systems are now appear to be better than ever. The problem was caused by executing a script I hadn't checked over thoroughly that blasted the file permissions up to root level on what was a particularly bad folder to be messing with.

Needless to say, when it happened I panicked and made the situation a hell of a lot worse. I had no idea what had happened and assumed the db had crashed due to corruption as the script I was running was db related. It turned out I was so far off the mark that I almost made more of a mess by panicking than the effect of the original mistake.

It's been interesting to go back through that moment of panic. Here's my search history - I got a good laugh out of it:

  • ubuntu restart mysql command
  • ubuntu mysql fail start
  • ubuntu mysql fail start db died
  • ubuntu mysql dead
  • ubuntu mysql dead after script . . << . . clutching at straws... lol
  • ubuntu mysql dead fail start
  • My searches were mostly in vane. So many things can cause major problems that search results aren't always useful. I beinging to learn that if you are not comming across a common problem that can easily be found on the net, you are probably dealing with a symptom and not the cause. Permissions are awesome and painful at the same time. They remind me a lot of webpage caching. Caching can improve the performance of your site like you wouldn't believe but it does add an extra element of voodoo. So many times I've pulled out hair because I can't see the effects from something I'm working on. You learn overtime to remember caching as the first thing when things go wrong. I now reaslise I need to do something similar with linux permissions.

    The other interesting thing about the crash is the famous failing backup process. Fortunatly the data backup for the site was well managed, it was the cold ec2 backup instance that failed that made my life just that much harder.

    The instance was sitting there are ready to be used. I had been thinking lately that I should give it a spin to test it out my ec2 command handling and blow the cobwebs out of the instance. When I decided to move to the backed up instance (after having torn the heart of my production instance) I totally forget that I changed the ssh keys and no longer had access. Maybe if I was a guru I could have hacked into it somehow but I knew that wasn't going to happen. So I made it to the verge of resue only to realise the door was locked and I thrown out the keys 6 months ago. Such a fool. So I rebuilt FS on top of a ec2ubuntu instance and all went smoothly from there. I'm just hoping I can manage to be a bit more careful when running scripts in the furture.


Any Comments?

Beta Launch Begins

Published: April 22, 2008

Flickspin's beta launch commences with a team of writers and site breakers ready to break loose.