← All Posts
23 March 2026 by Michael
CloudBusinessStrategy

Service Level Agreements are one of the least understood documents in technology procurement. A provider promises 99.9% uptime, the business signs the contract, and everyone assumes the system will essentially never go down. Then it does, and the SLA turns out to guarantee far less than anyone expected.

Understanding what an SLA actually says, and more importantly what it does not say, prevents surprises and saves money.

What the numbers mean

Uptime percentages sound reassuringly high. The actual downtime they permit is often larger than people expect.

AvailabilityDowntime per yearDowntime per month
99% (two nines)3.65 days7.3 hours
99.9% (three nines)8.7 hours43.8 minutes
99.95%4.4 hours21.9 minutes
99.99% (four nines)52.6 minutes4.4 minutes
99.999% (five nines)5.3 minutes26.3 seconds

A 99.9% SLA, which is standard for most cloud services, allows nearly nine hours of downtime per year. That could be one long outage or several shorter ones. Either way, it is more than most businesses realise when they sign the contract.

SLA Downtime Calculator

Enter an uptime percentage to see the actual downtime it permits.

% uptime
Per year
8.8 hours
Per month
43.8 minutes
Per week
10.1 minutes

Does your service actually need five nines?

This is the question that saves the most money and prevents the most over-engineering.

Five nines (99.999%) means no more than 5.3 minutes of downtime per year. Achieving that requires redundant infrastructure across multiple regions, automated failover, no single points of failure anywhere in the stack, and a team capable of maintaining all of it. The cost is substantial, both in infrastructure and in the engineering effort to keep it running.

The question is whether the business actually needs it.

An internal tool used by staff during office hours, Monday to Friday, does not need five nines. If it goes down for thirty minutes on a Tuesday afternoon, the impact is an inconvenience. Three nines, or even two nines, is perfectly adequate.

A marketing website that generates leads but does not process transactions can tolerate occasional brief outages without meaningful business impact. The cost of engineering five nines for a brochure site is almost certainly higher than the revenue lost during the downtime that three nines would allow.

An e-commerce platform processing orders around the clock has a clearer case for higher availability, but even here, the calculation matters. If the site generates GBP 10,000 per hour in revenue, one hour of downtime costs GBP 10,000. If achieving four nines instead of three nines costs GBP 50,000 per year in additional infrastructure, the investment only breaks even if the extra availability prevents more than five hours of downtime annually. If the site was already only experiencing two hours of downtime per year, the additional spend delivers no return.

The right availability target comes from a simple calculation: what does downtime actually cost the business per hour, and how does that compare to the cost of preventing it? For most small and mid-size businesses, three nines (99.9%) is the right target. The jump from three to four nines roughly doubles the infrastructure cost. The jump from four to five nines can multiply it again.

Paying for availability the business does not need is one of the most common sources of unnecessary cloud spending.

What SLAs actually cover

Most SLAs are narrower than they appear.

Uptime is measured monthly, not annually. A provider promising 99.9% monthly uptime can have a 43-minute outage every single month and remain compliant. The annual figure in the table above assumes the downtime is spread evenly, but in practice it tends to cluster.

Many SLAs define downtime as the service being completely unreachable, not merely degraded. If the database is responding but taking thirty seconds per query instead of one, that may not count as downtime under the SLA even though the application is effectively unusable.

Exclusions are also broader than most people expect. Scheduled maintenance windows, force majeure events, issues caused by the customer’s own configuration, and problems with third-party dependencies are commonly excluded. A provider can be down for hours during a “scheduled maintenance window” without breaching the SLA, even if that window was announced with minimal notice.

And the SLA covers the provider’s service, not the application. AWS guarantees availability for EC2 instances, not for the application running on them. If the application crashes due to a code bug or misconfiguration, that is not the provider’s problem regardless of what the SLA says.

What happens when an SLA is breached

This is where expectations diverge most sharply from reality.

The remedy for an SLA breach is almost always service credits, not financial compensation. If a cloud provider breaches its 99.9% SLA, the typical remedy is a 10-25% credit on the affected service for that month. For a business spending GBP 2,000 per month on cloud infrastructure, that means a GBP 200-500 credit against future bills.

If that same outage cost the business GBP 20,000 in lost revenue, the service credit covers a fraction of the actual impact. SLAs are not insurance policies. They do not compensate for business losses, reputational damage, or the cost of the team’s time spent on recovery.

Claiming the credit also requires action. Most providers require the customer to submit a claim within a specified period (often 30 days), with evidence that the SLA was breached. If the business does not have its own monitoring in place to verify downtime independently, it has to rely on the provider’s own reporting, which may not align with the customer’s experience.

Multi-service dependencies

Modern applications rarely depend on a single service. A typical web application might use a cloud provider for compute, a managed database, a CDN, a payment processor, a transactional email service, and a third-party authentication provider. Each has its own SLA.

The combined availability of the system is lower than the least reliable component. If five services each offer 99.9% uptime independently, the combined probability of all five being available simultaneously is roughly 99.5%, which translates to over 43 hours of potential downtime per year. That is significantly worse than any individual SLA suggests.

This matters because businesses often evaluate each vendor’s SLA in isolation and assume the overall system will perform accordingly. The reality is that the more dependencies a system has, the lower its effective availability becomes, regardless of individual SLA guarantees.

Understanding which services are in the critical path and which can degrade gracefully without affecting the customer experience is essential for realistic availability planning.

Internal SLAs

SLAs are not just for external vendors. Defining internal service level targets, often called SLOs (Service Level Objectives), for internal systems provides the same clarity.

If the internal development team operates a customer-facing application, agreeing on a target availability level, a target response time, and a process for when those targets are missed creates accountability and informs infrastructure investment.

Without internal targets, availability decisions are reactive. The system goes down, everyone scrambles, and afterward there is a conversation about whether more should have been invested in reliability. With defined targets, those conversations happen in advance, when the decisions are cheaper and less stressful.

The monitoring and observability practices covered in a separate post provide the data needed to track whether internal targets are being met.

What to negotiate

When evaluating a provider or renewing a contract, a few areas are worth attention.

Push for a downtime definition that includes degraded performance, not just total unavailability. A service that responds but is unusably slow is functionally down.

Understand how maintenance windows work and how much notice is required. Providers that can schedule maintenance at any time with 24 hours’ notice have a lot of flexibility to cause disruption without breaching the SLA.

Service credits are standard as a remedy, but the percentage and the claim process vary. Find out what the maximum credit is and whether it requires proactive claiming.

The provider should offer dashboards or reports that let the customer independently verify uptime. If the only source of truth is the provider’s own status page, disputes become difficult. And check whether the SLA covers all components. A managed Kubernetes service might have an SLA on the control plane but not on the worker nodes. A managed database might guarantee the engine but not the storage layer.

The practical takeaway

SLAs are useful as a baseline expectation and a signal of provider commitment, but they are not a guarantee that the business will not experience downtime, and they are not compensation when it does.

The real protection comes from the business’s own preparation: monitoring to detect problems independently, disaster recovery plans to restore service quickly, and realistic availability targets based on what the business actually needs rather than what sounds impressive on paper.

Paying for five nines when three nines is sufficient wastes money. Assuming an SLA will cover the cost of an outage leaves the business exposed. Understanding both of these prevents expensive mistakes.

If SLA evaluation or availability planning is an area where outside perspective would help, get in touch.

Want to talk about this?

If something here is relevant to what you are working on, we are happy to chat.

Get In Touch