Standardized headers for a reliable API ecosystem

Read it in italian

An API ecosystem leads to a tight integration between server and client systems.

To manage availability and avoid cascade failures you need to exchange service status information and the supported peak load.

Example - Overload case: citizen -> bank -> payment service -> municipality

  • A citizen wants to pay a local tax via home-banking
  • the bank interfaces with a payment service that contacts the municipality
  • the municipality expose an API for the payment of the municipal tax
  • on the day of expiry of the tax, the common endpoint is overloaded and can not handle all requests:
    • part of the requests will go into error
    • others will exceed the timeout set by the payment service and / or the bank’s website
    • a small part will be served on time

Emergency plans must therefore be made to manage error and / or overload situations.

The HTTP protocol offers two simple and powerful tools:

To report throttling policies just send the following headers for each response :

  • X-RateLimit-Limit: maximum limit of requests for an
  • X-RateLimit-Remaining endpoint : number of requests remaining until the next reset
  • X-RateLimit-Reset: the number of seconds missing at the next reset

In this way the clients can adjust the number of requests to send.

ATTENTION: we adopt the most common version for these three headers (see discussion on github), which are also used with slight variations. Clients can manage these variations, but API providers must adopt THIS exact syntax!

Overload/saturatations problems can be reported by returning HTTP 503 when the system realizes that it is not able to deliver the service according to the expected deadlines: this pattern is called Circuit-Breaker .

The statuses that indicate an overload must be returned as soon as possible:

  • HTTP 429 (too many requests) if the rate limit is exceeded
  • HTTP 503 (service unavailable) in case of unavailable service (eg in maintenance) or overload

To defer requests, you should always use the header

  • Retry-After: number of seconds after which to reappear

also implementing exponential back-off mechanisms .

A Retry-After and X-Ratelimit-Reset should be assigned values in seconds and, when appropriate adding a bit of jitter to prevent client groups to present simultaneously .

You can also think of "bouncing" the excess requests even if the system is not overloaded: a useful reading is The Global Chubby Planned Outage 1 from the book Google SRE

The use of uniform headers is very important:

  • simplifies client development, reducing errors;
  • avoids verifying the existence of always different headers.


Recently we got in touch with opensource communities to help them implementing this model.
As of now we have:

Further updates: