In February 2017, Amazon Web Services experienced a five-hour outage in their US-EAST-1 region when an engineer accidentally removed more servers than intended while trying to fix a bug in the S3 storage service (Novet, 2017). In September 2017, Microsoft Azure experienced a seven-hour outage in their EU-NORTH region, when a fire extinguishing system was accidentally set off during routine maintenance (Thomson, 2017).
Mitigating the Threat of Outages
With cloud outages seemingly always in the news, should companies rethink their strategy of moving their IT infrastructure to a public cloud? After all, a multi-hour outage can cost a company USD 1 million or more!
First, some good news: The majority of these headline-grabbing outages affect only a single region of a cloud service provider. A little more background on how public clouds are typically architected will illustrate why this is a vital point.
A cloud service provider’s network comprises multiple regions (e.g, AWS US-EAST-1 or Microsoft Azure EU-NORTH), which are geographically distinct and isolated from each other. Because of this isolation, the odds of multiple regions experiencing an outage at the same time are very small. In addition, some cloud providers will offer companies multiple zones within each of these regions that can act as backups for the other zones within the region.
What’s Your Pain Point?
Depending on its tolerance for downtime, a company looking to deploy in a public cloud can then choose a level of resilience that’s right for it: a single availability zone, multiple availability zones, or even multiple regions.
Here’s where the challenge lies: Architecting for greater resilience can take more effort on the part of the company and can cost more money. So it’s up to companies to decide what level of downtime they can tolerate, and plan their cloud usage accordingly. A company that cannot abide any downtime can choose the highest level of resilience its cloud provider offers, while a company that does not require that level of resilience can choose to deploy in only a single availability zone. For example, AIR Cloud clients that want to ensure maximum uptime can choose to implement a redundant Disaster Recovery instance of their environment to protect against any potential unforeseen periods of downtime.
Obviously, no cloud service provider wants to experience a service interruption—but eventually some downtime is inevitable . But with a little planning and forethought to take advantage of the resilience that a cloud service offers, companies can make sure they minimize the effects on their business.