@Ajit5ingh

SLI, SLO & SLA

Measuring and promising service reliability

What are SLI, SLO, and SLA?

These are three related ways to think about service reliability. SLI (Service Level Indicator) is what you measure, SLO (Service Level Objective) is your target goal, and SLA (Service Level Agreement) is the promise you make to customers with consequences if you break it.

Think of it like: A pizza delivery service. SLI = actual delivery time you measure, SLO = "we aim for under 30 minutes", SLA = "free pizza if we're late."

The Three Levels

SLI - Service Level Indicator

What it is: The actual measurement of your service performance.

Examples:

  • Request success rate: 99.5% of requests succeeded
  • Response time: 95% of requests completed under 200ms
  • Uptime: Service was available 99.9% of the time
  • Error rate: 0.2% of requests returned errors

Think: This is your speedometer reading - the raw data.

SLO - Service Level Objective

What it is: Your internal target for service performance.

Examples:

  • We aim for 99.9% uptime
  • 99% of requests should finish under 300ms
  • Error rate should stay below 0.1%
  • API should respond within 500ms for 95% of calls

Think: This is your speed limit - what you're aiming for.

SLA - Service Level Agreement

What it is: A formal promise to customers with consequences.

Examples:

  • We guarantee 99.5% uptime or you get a refund
  • 10% credit if monthly uptime drops below 99.9%
  • 25% credit if we're down more than 1 hour per month
  • Full refund if availability is below 99%

Think: This is a legal contract - break it and you pay up.

How They Work Together


graph TD
    A[Measure Service] --> B[SLI: Actual Numbers]
    B --> C{Compare to Target}
    C --> D[SLO: Internal Goal]
    D --> E{Meeting Goal?}
    E -->|Yes| F[Keep Going]
    E -->|No| G[Fix Issues]
    D --> H[SLA: Customer Promise]
    H --> I{Breaking Promise?}
    I -->|No| J[All Good]
    I -->|Yes| K[Pay Penalty]
    
    style A fill:#e0f2fe,stroke:#0369a1,stroke-width:2px
    style B fill:#e0f2fe,stroke:#0369a1,stroke-width:2px
    style D fill:#f3e8ff,stroke:#8b5cf6,stroke-width:2px
    style H fill:#dcfce7,stroke:#16a34a,stroke-width:2px
    style F fill:#dcfce7,stroke:#16a34a,stroke-width:2px
    style G fill:#fef3c7,stroke:#f59e0b,stroke-width:2px
    style K fill:#fecaca,stroke:#dc2626,stroke-width:2px

You measure with SLIs, aim for SLOs, and promise SLAs. Your SLA should be looser than your SLO to give yourself breathing room.

With vs Without Clear SLOs

No Clear Targets

Team: "Is the service fast enough?"

Manager: "I don't know... feels slow?"

Team: "Should we fix this bug?"

Manager: "Maybe? How bad is it?"

No clear way to decide what's important!

Problem: Everyone has different opinions about "good enough." Hard to prioritize work.

With SLOs

Team: "Response time is 250ms, SLO is 200ms"

Manager: "We're missing our target. Priority fix."

Team: "This bug affects 0.01% of users"

Manager: "Below our SLO. Can wait."

Clear data-driven decisions!

Result: Everyone knows the target. Easy to decide what needs attention now.

Setting Good SLOs

Start with what you currently achieve

Look at your data. If you're at 99.8% uptime now, don't promise 99.99%. Set SLOs based on reality.

Focus on what users actually care about

Users care if the page loads and works. They don't care about your CPU usage or memory. Pick metrics that affect user experience.

Make SLAs looser than SLOs

If your SLO is 99.9%, make your SLA 99.5%. This gives you room to have bad days without paying penalties.

Review and adjust

SLOs aren't set in stone. If you're always exceeding them, make them stricter. If you always miss, make them more realistic.

← Back to All Explainers