How does your application handle failure? Your first level of response might focuses on logging and displaying errors, but it merely captures the problem rather than resolving it. What happens if a vital service is offline or under heavy load? What about simply not performing at the standards you might expect? As your application relies more on services that you don't control, like third-party APIs, the need to handle these variables when they arise becomes more important.

Enter the Circuit Breaker

A circuit breaker is a mechanism for preventing damage to an electrical circuit–or electrical device–by interrupting the flow of power when a fault is detected.

In software this concept can detect the availability of a service, and prevent your application from continuously making failed requests until the issue is resolved. Martin Fowler describes it in his article on the subject:

The basic idea behind the circuit breaker is very simple. You wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all.

A circuit is best described in three states:

Closed: The closed state is the default "everything is working as expected" state. Requests pass freely through.

circuit-closed

Open: The open state rejects all requests without attempting to send them.

circuit-open

Half-Open: A set number of requests are let through in order to test the status of the resource. This state determines if the circuit returns to closed or open.

circuit-half

These states are dependent on pre-set criteria, known as thresholds, that might include qualities like: error rates in a given time-frame, latency, traffic, or even resource utilization. Many circuit breaker libraries, like opossum for Node.js, allow you to define a mix of threshold metrics as part of the configuration.

Determining the right threshold criteria

Threshold criteria can come in many forms. For APIs where speed is important, latency may be your core threshold. For an API that handles user accounts, uptime will be more important.

You can make these determinations by analyzing your API calls. If you don't already have a monitoring solution, this may be a difficult task to nail down on the first few tries. Alternately, you can use something like the Bearer Agent to automatically monitor and log calls to APIs. Then, you can analyze the data and set incident notifications. This provides a great foundation for informing your circuit breaker decisions.

Let's look at a basic example. Assume we have a function in our application that returns posts for a specific user, getPosts. It is a wrapper around a third-party's API client, apiClient. To implement the circuit breaker, let's use a popular library.

In opossum, this looks roughly like the following:

// Require the library and client
const circuitBreaker = require("opossum")
const apiClient = require("our-example-api-client")

// Our example request
const getPosts = (req, res, next) => {
  // Wrap our client get method in the circuit breaker
  const breaker = circuitBreaker(apiClient.get, {
    timeout: 3000,
    errorThresholdPercentage: 50,
    resetTimeout: 5000
  })

  // Call the API from within the circuit breaker
  return breaker
    .fire("/api/posts")
    .then(res.json)
    .catch(next)
}

Now when there are problems with the /api/posts endpoint from the third-party API, our circuit breaker starts a rolling assessment window. It captures the effectiveness of each call, and when a problem arises it triggers the breaker.

The configuration object allows us to set a timeout for requests (3 seconds), an error threshold percentage (50%), and a reset timeout when the "open state" of the breaker will transition to half-open. (5 seconds). This style of circuit breaker library is common.

Reacting to failures

Circuit breakers are useful for delaying retries and preventing unnecessary requests, but the true power comes in how your application can react to the states of the breaker.

Existing libraries can help with this. For example, opossum allows you to run a fallback function when the breaker's failure state is triggered. Alternately, you can respond to events that have been emitted.

This gives your application the power to do things like:

  • Call an alternate API if the primary one is down or under load.
  • Return cached data from a previous response, and notify the user.
  • Provide feedback to the user and retry the action in the background.
  • Log problems to your preferred logging service.

Should you implement circuit breakers into your application?

Of course! The uncertainty that comes with leveraging an external API can quickly add to your application's technical debt. By relying on proven patterns, like the circuit breaker, your team can build resiliency and graceful degradation into your application. While this article focused on external APIs, this pattern also provides a great way to make sure that your own internal microservices don't cause your application to fail.

In this article we briefly looked at opossum for Node and the browser, but most languages have a community library available:

Read more about the pattern and strategies for implementing your own with Microsoft's Guide to Circuit Breakers and Martin Fowler's original post.

Stay tuned to the Bearer Blog for more, and set up the Bearer Agent to start monitoring your APIs today.