~/codewithstu

SLIs vs SLOs vs SLAs Explained

·4 min read
devops

When it comes to measuring the quality of your service, three terms are frequently used: Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs). Although they sound similar, they each have different meanings and purposes. Let's dive into each of them.

Service Level Indicators (SLIs)

A Service Level Indicator (SLI) is a metric that measures the performance of a service. SLIs are used to understand a service's performance from the end-users perspective. They are often measured in terms of availability, latency, and throughput.

For example, with a website, you might use the following SLIs:

SLIs are generated on a per-event basis, such as a web request. Each event may feed into multiple SLIs and will create a result that must be one of the following:

Let's look at the example of a web request and see how we can map SLIs to a web request event. Imagine that you want to have the following SLIs:

For the web request, we could consider any 2XX responses as a success, 5XX responses as an error, and everything else we aren't interested in (e.g., redirects). We may also consider ignoring specific endpoints such as health checks. We can apply the same logic to the response time SLI. We are generally only interested in the 2XX responses, so everything else is mapped to "not interested". This would be generated from the same request/response data for the SLI error rate.

Service Level Objectives (SLOs)

A Service Level Objective (SLO) is a target that defines an SLI's acceptable performance level. SLOs are used to set expectations for how well a service should perform. SLOs are typically expressed as a percentage over a given period.

For example, if your website has an SLI of availability, you might set an SLO of 99.9% over a month. This means your website should be available 99.9% of the time in any given month.

What makes a good SLO?

SLOs must be:

Some examples of good SLOs:

SLO Adherence

The adherence to an SLO is always expressed as a percentage and only ever accounts for SLI events that interest us, e.g., Passed/Failed. We can think about SLOs using the following formula:

SLO Adherence = 100 * (passed / (passed + failed))

If we have 132 events that we are interested in, 5 of which failed, then the calculation would be as follows:

Passed = 127 events
Failed = 5 events
SLO Adherence = 100 * (127 / (127 + 5))
SLO Adherence = 100 * (127 / 132)
SLO Adherence = 100 * 0.9621212121212121
SLO Adherence = 96.21% (rounded to 2dp)

Each SLO we publish should be available on a continually updated basis.

Service Level Agreements (SLAs)

A Service Level Agreement (SLA) is a contract between a service provider and a customer that defines the level of service the provider will deliver. SLAs are used to establish a mutual understanding between the provider and the customer regarding the level of service that will be provided.

For example, a cloud provider might offer an SLA guaranteeing 99.9% availability for your cloud services. If you fail to meet this SLA, you may have to provide a service credit or refund to the customer.

In conclusion, SLIs, SLOs, and SLAs are all critical components of measuring the quality of your service. SLIs measure the performance, SLOs set the acceptable level of performance, and SLAs establish the level of service that will be delivered. You can ensure that your service meets your customers' needs by tracking and meeting these metrics.

// share_this