What is an SLA? API Service-Level Agreements and How to Find Them
When you rely on a third party API for your application's features, it is important that you can reliably expect them work. Knowing that their uptime will be consistent, or greater than your own, and knowing that their support will be available if you identify a problem, can go a long way in making your choice of APIs easier. In this article we'll look at the Service Level Agreement, or SLA, and how it protects both you and the provider in the event of an outage or problem. We'll also look at how to find them, and what to do if a provider doesn't have one publicly available.
What is an SLA
An SLA is a contract between the service provider and the customer that explicitly states terms for the service. Aside from general legal information, the area we want to focus on with an SLA is the guarantee, or assurance, that certain expectations will be met. These expectations can range from uptime and reimbursement for extended outages, to response times by the support team.
When you're an enterprise customer, you may be presented with the SLA when agreeing to terms and signing any service contract. The SLA may be custom to your plan, or standard for all customers. If you don't see a publicly available SLA, make sure that the sales team provides you with one.
What to look for in an API's SLA
Some SLAs will mix elements of the customer agreement with the offerings of the company. When we're looking at assurances, we're primarily looking for target availability, service credits, and support response time.
Uptime (Target Availability)
Uptime, or target availability, is a common metric to see when assessing a service-level agreement. This is the expected uptime of the service provider, not including scheduled maintenance or downtimes. These numbers are normally presented anywhere between 95% and 99.9999%. The extra decimal places may just seem like a marketing tactic, but they matter.
Let's take an example of a service that offers 99.99% target availability. Often these targets are on a monthly window, so for our calculation we'll assume 30 days in a month. This leaves us with:
60min x 24h x 30days = 43200 minutes in a month
99.99% of 43200 minutes comes out to be 43195.68 minutes. This means that an API with this target availability assures us that, at most, we shouldn't expect more than just over 4 minutes of unexpected downtime per month.
The extra digits in the 99.99% are important, and it isn't uncommon to see tiers of uptime that increase the target availability. As an example, with the same numbers used above, but with only a 99% target, we could potentially see up to 7 hours and 12 minutes of downtime in a 30 day span.
Service credits are a form of future reimbursement for a problem. Rather than refund past usage, the service credits can be applied to future usage.
Providers will often use tiers of target availability when determining service credits. For example, AWS Lambda steps from a 10% credit for missed availability between 99.95% and 99% all the way down to a 100% service credit for availability below 95%.
You should also be aware that service credit percentages are based on the cost of the service itself. As a result, you generally cannot receive credits for free plans or instances where your you were not paying for the service.
The final category we often see when assessing service level agreements is support availability. This could be live-chat response time, email response time, or support request response time. This doesn't normally guarantee a resolution to the problem. For example, Heroku offers support tiers that adjust the support response time depending on the plan chosen. This ranges from a 1+ day turnaround, during business hours on their included plan, up to a one hour response time, 24x7, on their premium support plan.
It is not common to see a missed support availability target result in service credits in an SLA, as these numbers are more reliably achieved and have fewer variables than service uptime.
Extra things to be aware of
While many SLAs, when they are clearly presented, can seem straightforward, there are some caveats to watch for.
One clause that consistently appears in many service-level agreements related to the uptime of third-party providers. For example, if AWS has an outage that takes the provider's service down, the target availability might not apply. In an ecosystem where most APIs use the infrastructure of a cloud service provider to offer their services, they are also at the whim the problems of the external provider. As a result, many SaaS services will include a clause in their SLA t hat removes or reduces liability in the event that their own external provider goes down. This isn't limited to cloud services, but can also affect any network infrastructure between your services and the provider's. Keep this in mind when calculating the risk of using an API based solely on their SLA guarantees.
Look out for differentiation between an SLA that applies to all product tiers, and an SLA that only applies to certain plans. Often, enterprise-level plans will have a more customer-friendly SLA as these customers are paying more. As we saw with the Heroku example earlier, you can purchase higher tiers of support. In many cases, higher tiered plans will also offer higher levels of target availability.
It's your responsibility to request a credit
While generous service credit reimbursements can look appealing, they sometimes only cover the time that your application could not access the service, and in many cases require your own due diligence to receive compensation. For example, Google's Cloud Services SLA, for Maps specifically, requires that the customer report the suspected problem within 30 days in order to qualify for the service credit. Don't assume that the provider, even if they send out notices about an outage, will automatically reimburse you for any qualifying downtime.
How can you know if an SLA is being met?
There are some third-party services that can help with tracking SLAs and monitoring agreements. The best way to begin monitoring SLA commitments is to begin monitoring the availability of the API. You can do this by tracking from within your own app, using health-check endpoints if offered, and keeping track of the providers status page. It's worth mentioning that status pages are often manually updated, so they won't notify you of immediate downtimes, or problems that the provider doesn't feel warrant mention.
You can use a tool that monitors throughput, error rates, and performance like Bearer to get a feel for the reliability of the web service. This data can then be used when discussing the SLA targets with the provider. Remember that the problems often need to be the result of their service directly, not problems between your servers and theirs—such as DNS problems.
Can you negotiate on a custom SLA?
Many providers do not display a public-facing SLA, but still offer SLAs for enterprise-level and business customers. When an SLA is not explicitly listed, you can negotiate the terms of the SLA when discussing your plan with the provider's sales team. They will often have fixed terms for target availability, but you can negotiate service credits. In general, the more money you're sending their way, the higher service availability you will can receive.
Use the metrics we've noted above, and determine what levels work best for your needs when negotiating. When in doubt, contacting the provider's sales team is a good place to start when you can't find clear answers about their offering.
How to find a provider's SLA
Ideally, SLAs would be easy to find. Unfortunately, that isn't always the case. You can often find the SLA in the terms and conditions portion of the developer agreement, or on the legal page of the API provider's website. Depending on the type of service, there may be differing SLAs for the service itself and the API. For example, some providers that offer a core service, with an API as a secondary feature, may only guarantee uptime and support for the service itself, but not the API. This generally isn't an issue, but this can be a problem if their outward-facing resources are not as reliable as the internal APIs powering the service.
While searching for "API NAME SLA" can sometimes get you there, the problem comes down to a lack of consistency amongst providers.
If you aren't lucky enough to be using a service that has a dedicated page for their SLA, like Twilio for example, here are some areas to check:
- Terms and Conditions
- Developer Agreement
- Support Agreement
- The pricing page
As we mentioned earlier, it is also common for SLAs to only be offered for enterprise or business customers. If you cannot find a publicly available SLA, reach out to their sales team.
SLAs for popular APIs
At Bearer, we've collected the SLAs for a variety of providers. Below you can find SLAs for some of the most-used APIs and web services.
Good SLA Citizens
The following include clear targets for availability and what happens when they aren't met.
- Google Maps
- GitHub Enterprise
- Microsoft (You'll need to download a document and find the services you're using)
Other SLA Citizens
These providers offer limited-to-no information about target availability and service guarantees. We've included links to their service or developer agreements, but it's best to reach out to their sales teams directly for specific details.
- Heroku (Only customer service targets are available)
- Box (Early versions of their terms had SLA targets, but the current terms do not)
Keep tabs on your SLAs
Ensuring that the APIs you consume are meeting their SLAs is important, and can save your business money. Metrics like target availability can also act as a deciding factor between two otherwise-equal API providers.
Are you looking for better ways to manage your third-party API usage and better understand the APIs you consume? We're building solutions that help ensure your application is better protected against API failures and outages. Give Bearer a try and check out more advice on making the most of your API dependencies on the Bearer Blog.