The System Design Academy Logo
Published on

What is Availability?

Authors

Remember the little pizza shop story from the scalability blog? After that Instagram post went viral, 200 people showed up, instead of ususal 20. This is where, "Availability" comes into play.

Your pizza shop have two choices -

  1. Stay open but with ~2 hours wait time (available but slow)
  2. Close the shop temporarily because you ran out of ingredients (unavailable)

Most owners choose option 1. Stay open, serve slowly but keep serving. Smart choice. Because in software world also, being slow is way better than being unavailable.

Availability is wheather your system can respond to requests, even if it slower than usual.

What is availability?

Availability is simple - Can users access your system when they need it?

In our pizza shop example, we were always open. Even though customers complained about long wait times, but they got served. Now, imagine if we had shut down the shop instead. Those same customers would have lost trust in us. Probably they would go to competitors and never come back.

This is why availability is very important.

How availability is measured?

We measure availability as a percentage of system uptime over a given period of time.

Availability=(TotalTimeDowntime)/TotalTime100Availability = (Total Time - Downtime ) / Total Time * 100

Let's understand this with Pizza shop example. So, we track the shop for 30 days a month. We are supposed to be open 12 hours a day (~360 hours total).

One day, we had to close for 4 hours due to a cylinder issue. So, availability would be -

Availability=(3604)/360100=98.89Availability = (360 - 4) / 360 * 100 = 98.89%

And, system downtime hurts the most during peak hours.

SLA nunbers

We might have come across these "fancy" terms like three nines, four nines, etc. These are SLA numbers. Industry talks about availability in "nines". Like, 99% (two nines), 99.9% (three nines), 99.99% (four nines), etc.

Adding each nine becomes 10x expensive. Amazon S3, for example, is designed for 99.999999999% durability (that’s eleven nines), making it one of the most reliable storage services in the industry.

But in practice, you need to find a sweet spot that balances reliability and cost. From a business point of view, going for too many nines can be overkill. We might end up spending a lot of money for availability, that users don't really need.

The goal isn't perfect availability (that's impossible and expensive). The goal is appropriate availability for your use case.