What is SLA Uptime?
SLA (Service Level Agreement) uptime is a contractual promise about how available a service will be. It's expressed as a percentage of total time. For example, 99.9% uptime means the service is expected to be available 99.9% of the time, allowing for 0.1% downtime.
The remaining percentage (100% - uptime%) is your error budget. The maximum amount of downtime you can have before breaching your SLA.
How to Calculate Downtime Percentage
Calculating downtime percentage is straightforward once you understand the relationship between uptime and downtime:
- Start with your uptime percentage: For example, 99.5% uptime
- Calculate downtime percentage: Downtime % = 100% - Uptime % = 100% - 99.5% = 0.5%
- Convert to actual time: Multiply the downtime percentage by the total time period
- For a year (365 days): 0.5% × 365 days × 24 hours = 43.8 hours of downtime allowed per year
- For a month (30 days): 0.5% × 30 days × 24 hours = 3.6 hours of downtime allowed per month
Formula: Allowed Downtime = (100% - Uptime%) × Total Time Period
For example, with 99.5% uptime over a year: (100% - 99.5%) × 8,760 hours = 0.5% × 8,760 = 43.8 hours per year.
Use this calculator above to instantly convert any uptime percentage to downtime for different time periods (day, week, month, year).
The "Nines" Explained
In the industry, we refer to uptime levels by counting the number of 9s:
- Two nines (99%): ~3.65 days downtime per year. Basic availability, acceptable for internal tools.
- Three nines (99.9%): ~8.76 hours downtime per year. Standard for most SaaS products and APIs.
- Four nines (99.99%): ~52 minutes downtime per year. Enterprise-grade, requires significant investment.
- Five nines (99.999%): ~5 minutes downtime per year. Mission-critical systems, extremely expensive to achieve.
Each additional nine is exponentially harder to achieve and typically costs 10x more in infrastructure and engineering effort.
Error Budgets in Practice
Error budgets help teams balance reliability with feature velocity:
- Budget remaining: Deploy new features, run experiments, take risks.
- Budget exhausted: Focus on reliability, fix issues, reduce change velocity.
- Monthly reset: Most teams track error budgets on a rolling 30-day window.
This creates a data-driven way to decide when to prioritize stability over new features.