The Circuit Breaker Pattern

How does your application handle failure? Your first level of response might focus on logging and displaying errors, but it merely captures the problem rather than resolving it. What happens if a vital service is offline or under heavy load? What about simply not performing at the standards you might expect?

As your application relies more on services that you don't control, like third-party APIs, the need to handle these variables when they arise becomes more important. Fortunately, there is a software design pattern that can help make your application more resilient in these scenarios. That is called the circuit breaker design pattern.

Update: Looking to build your own circuit breaker? Check out our series on Building a Circuit Breaker in Node.js.

Enter the Circuit Breaker

A circuit breaker is a mechanism for preventing damage to an electrical circuit--or electrical device. When it detects a fault, it interrupts the flow of power.

The purpose of the circuit breaker pattern in programming is to detect the availability of a service and prevent your application from continuously making failed requests. Once the service is working as expected, the breaker returns to making requests. Martin Fowler describes it in his article on the subject:

The basic idea behind the circuit breaker is very simple. You wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all.

Circuit breakers are a popular cloud pattern because in the context of a microservice-heavy architecture, a single failure can cascade to your other services. They have grown in popularity with libraries like Hystrix from Netflix, which aimed to provide a more latency and fault-tolerant approach to using external services.

This concept lends itself nicely to applications that also rely heavily on third-party APIs. They can make it so your services aren't affected if a partner service experiences a problem.

Let us talk about how a circuit breaker works. Just like the physical breaker, the circuit moves between three states.

The circuit breaker states

Closed: The closed state is the default "everything is working as expected" state. Requests pass freely through. When certain failures happen, they cause a circuit break and closed moves to open.

circuit-closed

Open: The open state rejects all requests for a fixed amount of time without attempting to send them. Once the breaker trips, it enters the open state. At this point, any requests to the service will fail automatically.

circuit-open

Half-Open: The breaker allows a set number of requests through in order to test the status of the resource. The half-open state determines if the circuit returns to closed or open.

circuit-half

These states are dependent on pre-set criteria, known as thresholds, that might include qualities like: error rates in a given time-frame, latency, traffic, or even resource utilization. Many circuit breaker libraries, like opossum for Node.js, allow you to define a mix of threshold metrics as part of the configuration.

Once configured, the breaker will handle the changing of state and as a result allow or deny requests. The circuit breaker pattern also lends itself nicely to state machines implementations, as the movement between states is well defined.

Determining the right threshold criteria

Threshold criteria can come in many forms. For APIs where speed is important, latency may be your core threshold. For an API that handles user accounts, uptime will be more important. Circuit breaker design is all down to choosing the right threshold criteria.

You can make these determinations by analyzing your API calls. If you don't already have a monitoring solution, this may be a difficult task to nail down on the first few tries. Alternately, you can use something like the Bearer Agent to automatically monitor and log calls to APIs. Then, you can analyze the data and set up notifications. This provides a great foundation for informing your circuit breaker decisions.

Let's look at a basic example. Assume we have a function in our application that returns posts for a specific user, getPosts. It is a wrapper around a third-party's API client, apiClient. To implement the circuit breaker, let's use a popular library.

In opossum, this looks roughly like the following:

// Require the library and client
const circuitBreaker = require("opossum")
const apiClient = require("our-example-api-client")

// Our example request
const getPosts = (req, res, next) => {
  // Wrap our client get method in the circuit breaker
  const breaker = circuitBreaker(apiClient.get, {
    timeout: 3000,
    errorThresholdPercentage: 50,
    resetTimeout: 5000
  })

  // Call the API from within the circuit breaker
  return breaker
    .fire("/api/posts")
    .then(res.json)
    .catch(next)
}

Now when there are problems with the /api/posts endpoint from the third-party API, our circuit breaker starts a rolling assessment window. It captures the effectiveness of each call, and when a problem arises it triggers the breaker.

The configuration object allows us to set a timeout for requests (3 seconds), an error threshold percentage (50%), and a reset timeout when the "open state" of the breaker will transition to half-open. (5 seconds). It may look something like the following diagram:

Circuit Breaker Diagram

Some common thresholds that can be used to trip the breaker in this pattern are server timeout, increase in errors or failures, failing status codes, and unexpected response types.

Reacting to failures

Circuit breakers are useful for delaying retries and preventing unnecessary requests, but the true power comes in how your application can react to the states of the breaker.

Existing libraries can help with this. For example, opossum allows you to run a fallback function when the breaker triggers the failure state. Alternately, event emitters can notify your app that the state changed.

For example, this gives your application the power to do things like:

  • When certain services fail, replace them by using alternate APIs.
  • Return cached data from a previous response, and notify the user.
  • Provide feedback to the user and retry the action in the background.
  • Log problems to your preferred logging solution, or use a logging service.

The resulting state lifecycle

To bring it all back together, an example of the full lifecycle of a circuit breaker is as follows:

  • The state starts closed. The service works as expected.
  • Multiple failures occur when trying to reach the service. Some could be a timeout, while others may be server errors.
  • The circuit breaker trips, and moves into the open state. The open state has a set time that it waits before performing any action, so it waits.
  • Any incoming requests to the service at this time immediately fail. The breaker continues to wait in the open state.
  • After this timeout, the breaker moves to the half-open state.
  • A portion of requests to the service are now allowed through. If a failure occurs (or a set number of failures occur), the breaker moves back to the open state and the timeout process begins again.
  • If requests in the half-open state succeed, the breaker knows that the service is working as expected and moves back to the closed state.

How the breaker design determines the timeout on the open state, the success criteria for reaching services when the breaker is half-open, and the failure threshold for moving from closed to open all determine how your application will be using the circuit breaker design.

Should you implement circuit breakers?

The uncertainty that comes with adding an external API can quickly add to your application's technical debt. By relying on proven patterns, like the circuit breaker, your team can build resiliency and graceful degradation into your application. While this article focused on external APIs and web services, this pattern also provides a great way to make sure that your own internal microservices don't cause your application to fail.

The nice part is that the design pattern itself, moving through the states when a service fails, is general enough that it can exist across different implementations.

In this article we briefly looked at opossum for Node and the browser, but most languages have a community library available:

You can also follow our two-part guide to build your own starter circuit breaker in Node.js.

Read more about the pattern and strategies for implementing your own with Microsoft's Guide to Circuit Breakers and Martin Fowler's original post.

For an easier solution, use Bearer

At Bearer we have done the heavy lifting. Our in-app agent installs quickly and can handle a variety of anomalies. It remediates the problems when they happen, and even allows you to set up a circuit breaker with ease. Get started in only a few minutes and try using the agent by visiting Bearer. With the agent installed, you can do things like:

  • Receive real-time notifications when there is a service failure or another anomaly with an API's performance.
  • Set up automatic remediations so API problems don't directly affect your application.
  • Analyze all, or specific, API calls to learn more about how your APIs are performing.

Stay tuned to the Bearer Blog for more and connect with us @BearerSH.

You may also like

Consume APIs. Stay in Control.

Monitor, track performance, detect anomalies, and fix issues on your critical API usage.

Learn more Request Demo