What is SLA Uptime?
SLA (Service Level Agreement) uptime is a contractual promise about how available a service will be. It's expressed as a percentage of total time. For example, 99.9% uptime means the service is expected to be available 99.9% of the time, allowing for 0.1% downtime.
The remaining percentage (100% - uptime%) is your error budget — the maximum amount of downtime you can have before breaching your SLA.
The "Nines" Explained
In the industry, we refer to uptime levels by counting the number of 9s:
- Two nines (99%): ~3.65 days downtime per year. Basic availability, acceptable for internal tools.
- Three nines (99.9%): ~8.76 hours downtime per year. Standard for most SaaS products and APIs.
- Four nines (99.99%): ~52 minutes downtime per year. Enterprise-grade, requires significant investment.
- Five nines (99.999%): ~5 minutes downtime per year. Mission-critical systems, extremely expensive to achieve.
Each additional nine is exponentially harder to achieve and typically costs 10x more in infrastructure and engineering effort.
Error Budgets in Practice
Error budgets help teams balance reliability with feature velocity:
- Budget remaining: Deploy new features, run experiments, take risks.
- Budget exhausted: Focus on reliability, fix issues, reduce change velocity.
- Monthly reset: Most teams track error budgets on a rolling 30-day window.
This creates a data-driven way to decide when to prioritize stability over new features.