What are SLI, SLO, and SLA?
These are three related ways to think about service reliability. SLI (Service Level Indicator) is what you measure, SLO (Service Level Objective) is your target goal, and SLA (Service Level Agreement) is the promise you make to customers with consequences if you break it.
Think of it like: A pizza delivery service. SLI = actual delivery time you measure, SLO = "we aim for under 30 minutes", SLA = "free pizza if we're late."
The Three Levels
SLI - Service Level Indicator
What it is: The actual measurement of your service performance.
Examples:
- Request success rate: 99.5% of requests succeeded
- Response time: 95% of requests completed under 200ms
- Uptime: Service was available 99.9% of the time
- Error rate: 0.2% of requests returned errors
Think: This is your speedometer reading - the raw data.
SLO - Service Level Objective
What it is: Your internal target for service performance.
Examples:
- We aim for 99.9% uptime
- 99% of requests should finish under 300ms
- Error rate should stay below 0.1%
- API should respond within 500ms for 95% of calls
Think: This is your speed limit - what you're aiming for.
SLA - Service Level Agreement
What it is: A formal promise to customers with consequences.
Examples:
- We guarantee 99.5% uptime or you get a refund
- 10% credit if monthly uptime drops below 99.9%
- 25% credit if we're down more than 1 hour per month
- Full refund if availability is below 99%
Think: This is a legal contract - break it and you pay up.
How They Work Together
graph TD
A[Measure Service] --> B[SLI: Actual Numbers]
B --> C{Compare to Target}
C --> D[SLO: Internal Goal]
D --> E{Meeting Goal?}
E -->|Yes| F[Keep Going]
E -->|No| G[Fix Issues]
D --> H[SLA: Customer Promise]
H --> I{Breaking Promise?}
I -->|No| J[All Good]
I -->|Yes| K[Pay Penalty]
style A fill:#e0f2fe,stroke:#0369a1,stroke-width:2px
style B fill:#e0f2fe,stroke:#0369a1,stroke-width:2px
style D fill:#f3e8ff,stroke:#8b5cf6,stroke-width:2px
style H fill:#dcfce7,stroke:#16a34a,stroke-width:2px
style F fill:#dcfce7,stroke:#16a34a,stroke-width:2px
style G fill:#fef3c7,stroke:#f59e0b,stroke-width:2px
style K fill:#fecaca,stroke:#dc2626,stroke-width:2px
You measure with SLIs, aim for SLOs, and promise SLAs. Your SLA should be looser than your SLO to give yourself breathing room.
With vs Without Clear SLOs
No Clear Targets
Team: "Is the service fast enough?"
Manager: "I don't know... feels slow?"
Team: "Should we fix this bug?"
Manager: "Maybe? How bad is it?"
No clear way to decide what's important!
Problem: Everyone has different opinions about "good enough." Hard to prioritize work.
With SLOs
Team: "Response time is 250ms, SLO is 200ms"
Manager: "We're missing our target. Priority fix."
Team: "This bug affects 0.01% of users"
Manager: "Below our SLO. Can wait."
Clear data-driven decisions!
Result: Everyone knows the target. Easy to decide what needs attention now.
Setting Good SLOs
Start with what you currently achieve
Look at your data. If you're at 99.8% uptime now, don't promise 99.99%. Set SLOs based on reality.
Focus on what users actually care about
Users care if the page loads and works. They don't care about your CPU usage or memory. Pick metrics that affect user experience.
Make SLAs looser than SLOs
If your SLO is 99.9%, make your SLA 99.5%. This gives you room to have bad days without paying penalties.
Review and adjust
SLOs aren't set in stone. If you're always exceeding them, make them stricter. If you always miss, make them more realistic.