Standardized headers for a reliable API ecosystem

Roberto_Polli · 17 Gennaio 2019, 4:20pm

An API ecosystem leads to a tight integration between server and client systems.

To manage availability and avoid cascade failures you need to exchange service status information and the supported peak load.

Example - Overload case: citizen -> bank -> payment service -> municipality

A citizen wants to pay a local tax via home-banking
the bank interfaces with a payment service that contacts the municipality
the municipality expose an API for the payment of the municipal tax
on the day of expiry of the tax, the common endpoint is overloaded and can not handle all requests:
- part of the requests will go into error
- others will exceed the timeout set by the payment service and / or the bank’s website
- a small part will be served on time

Emergency plans must therefore be made to manage error and / or overload situations.

The HTTP protocol offers two simple and powerful tools:

error status codes 429 (too many requests) and 503 (service unavailable)
headers Retry-After and X-RateLimit- *

To report throttling policies just send the following headers for each response :

X-RateLimit-Limit: maximum limit of requests for an
X-RateLimit-Remaining endpoint : number of requests remaining until the next reset
X-RateLimit-Reset: the number of seconds missing at the next reset

In this way the clients can adjust the number of requests to send.

ATTENTION: we adopt the most common version for these three headers (see discussion on github), which are also used with slight variations. Clients can manage these variations, but API providers must adopt THIS exact syntax!

Overload/saturatations problems can be reported by returning HTTP 503 when the system realizes that it is not able to deliver the service according to the expected deadlines: this pattern is called Circuit-Breaker .

The statuses that indicate an overload must be returned as soon as possible:

HTTP 429 (too many requests) if the rate limit is exceeded
HTTP 503 (service unavailable) in case of unavailable service (eg in maintenance) or overload

To defer requests, you should always use the header

Retry-After: number of seconds after which to reappear

also implementing exponential back-off mechanisms .

A Retry-After and X-Ratelimit-Reset should be assigned values in seconds and, when appropriate adding a bit of jitter to prevent client groups to present simultaneously .

You can also think of "bouncing" the excess requests even if the system is not overloaded: a useful reading is The Global Chubby Planned Outage 1 from the book Google SRE

The use of uniform headers is very important:

simplifies client development, reducing errors;
avoids verifying the existence of always different headers.

Updates

Recently we got in touch with opensource communities to help them implementing this model.
As of now we have:

a retry-after handler for WSO2
GovWay is implementing almost everything
3scale/apicast implemented Retry-After and is discussing on X-RateLimit on ticket 953

Roberto_Polli · 15 Ottobre 2019, 2:37pm

Further updates:

WSO2 implemented Retry-After in case of 429 Too Many Requests
Azure API Gateway has a feature request on that. Please vote it!
RedHat 3Scale API Gateway has a Feature Request
We are working on an internet draft standardizing RateLimit-* headers: https://tinyurl.com/draft-ratelimit-html that has been presented at IETF106

Roberto_Polli · 23 Dicembre 2019, 10:05am

Update:

Kong API Gateway just merged this guideline in the next branch that will be published as 2.0.0