Skip to content

Fallacies of the Cloud

Magical New Technology Still Subject to Human Error
Monday, July 25th, 2011
cloud fallacies

The cloud is the new thing. Put your apps in the cloud! Put your business processes and web services there too! Hey, why not upload your customer data and company history while you’re at it? The cloud is always there, always accessible, and always will be. Keeping your data on your own servers is so 1990s. Having your own IT department is so 1980s. A company doesn’t need anything but an Internet connection and some computers to run everything now! It’s a US$1.6 billion industry this year, so what could go wrong?

At least this is the impression you get when hearing people talk about cloud services, or reading about the latest advancements in cloud computing. However, while cloud services can be an effective part of a good computing strategy for your business, they are not going to be the perfect magical bullet to every problem. Why? Because cloud services are still just run by people, and they exist in the world where accidents still happen. You remember that time when your power went out and there wasn’t enough gas in the backup generators to run your company’s servers for more than thirty minutes? It could still happen, if you’re on the cloud, to the cloud’s servers. And remember that time when the tape backups caught on fire and you lost 20 years wort of data storage in an afternoon? The cloud is not immune from such mishaps.

Such was the case on Thursday, April 21 of this year, when the most famous cloud services company, Amazon Web Services, suffered an outage of its cloud computing services. Full restoration of its available offerings was not completed until Saturday. The company that shrugged off an Anonymous DDoS attack as though it was nothing experienced a network error that resulted in a prolonged outage of services. Most of the affected customers were on the East Coast of the US. Customers experienced hours of downtime, system latency issues, and high error rates. The Elastic Compute Cloud (EC2) and the Amazon Web Services Elastic Beanstalk were just not stretchy enough. Some prominent Amazon customers that were affected were the web sites Quora, Reddit, and Foursquare.

The exact technical error, according to a report released by Amazon, said, “A networking event early this morning triggered a large amount of re-mirroring of EBS [Extended Block Store] volumes ... This remirroring created a shortage of capacity ... which impacted new EBS volume creation as well as the pace with which we could remirror and recover affected EBS volumes.” So something or somebody screwed up and caused a lot of copying on the network, which ate up system resources and slowed everything down. Just like when you accidentally copy that huge movie file from one place to another on your hard disk, it becomes just a little bit more difficult to play that game you’re playing.

This was not the only cloud-related incident in the past month. A few days later, VMware’s cooler-named Cloud Foundry service suffered its first outage after being online for just two weeks. It is a similar service to Amazon Web Services, which offers a platform for web developers to build and host web applications. But there was a power failure in one of the data centers, which escalated and caused a lot of service interruptions for two days. Most of the interruptions resulted in developers not being able to make any changes to their services – things simply stayed as they were. In the end, the problem was traced to a malfunctioning power supply in a storage cabinet. That is a very simple thing to have go wrong to cause so much grief to paying customers.

The problem was recovered from, but the next day human error created more problems. While developing a set of rules and regulations in order to avoid this type of thing happening in the future, one of the operations engineers in the data center accidentally touched a keyboard which should not have been touched. They pressed the wrong button, and caused the full outage of the network infrastructure sitting in front of the Cloud Foundry. That included all load balancers, routers, and firewalls. It took offline some of the DNS servers. The end result was that the two-week-old Cloud Foundry service from VMware was completely disconnected from the Internet. So despite the latest in technology and cool names, the Cloud Foundry was still brought down by human error.

Multi-billion-dollar corporation Microsoft is also not immune to the cloud computing outage fairy. Their cloud computing outage happened in August of last year. Microsoft has signed up 40 million paid users of their Microsoft Online Services cloud computing offering, which includes 9,000 business customers and 500 government entities. Microsoft didn’t release too many details of their particular cloud outage, but it happened over two hours and affected most of North America. Services were unreachable during that time, most notable being Microsoft’s Business Productivity Online Suite, the service that the company has been pushing the hardest.

These three incidents illustrate one of the biggest drawbacks of the cloud, which is that you are turning over your business to a third party. There is hardly a single business today that can operate effectively without computing resources. If a company trusts a cloud service provider to give them all of the network, storage, and computing required to run their business, and the cloud service provider has an outage, the company cannot do anything to fix the problem. They simply have to wait for the third party to get its act together. However, in the older model of computing, the company buys all the resources that they need and stick them on their own property, so that if something goes wrong they can call their IT guys in there to fix it as fast as possible. Thus a costbenefit analysis is something that each company must do in order to determine which solution is right for them. Cloud computing must be examined closely with a full understanding of the implications before a company chooses whether or not to use it. Be forewarned.

Login or register to tag items

Open source newspaper and magazine cms software