2 min read

Circuit Breaker: The Emergency Switch That Stops Systems from Crashing

You have built a microservice. It calls another service. That service starts responding with 5xx errors. Your service keeps retrying, making the problem worse. More requests pile up. Latency climbs. Eventually, everything collapses.

This is called cascading failure, and it is one of the most common ways distributed systems die.

Enter the Circuit Breaker — a pattern that prevents your system from lighting itself on fire.

Key sources: "Release It!" by Michael Nygard (the original source for this pattern), Martin Fowler's blog post on Circuit Breaker, and Netflix/Hystrix engineering blogs.


What It Does

A circuit breaker wraps calls to an external service and monitors failures. When failures cross a threshold, the breaker trips, stopping all further calls before they happen.

Think of your home's electrical panel. When too much current flows, a switch flips and cuts power. Without it, wires melt and fires start. Same idea here, except the "fire" is your entire backend melting under load.


The Three States

1. Closed (Normal Operation)

The circuit is closed. Every request passes through to the target service. The breaker counts failures. If failures stay below the threshold, nothing happens.

2. Open (Stop Everything)

Failures exceeded the threshold. The breaker opens. All subsequent calls fail immediately, and no network request is made. The system returns a fallback response instead. This gives the downstream service time to recover.

3. Half-Open (Testing the Waters)

After a timeout period, the breaker moves to half-open. It allows one request through as a probe. Success leads to the breaker closing. Failure leads to it opening again. This prevents toggling.


Configuration That Matters

| Parameter | What It Controls | Example | |-----------|-----------------|---------| | Failure threshold | How many failures before tripping | 5 failures in 10s | | Timeout duration | How long before trying again | 30 seconds | | Half-open max requests | How many probes to send | 1 request | | Fallback response | What to return when open | Cached data or error |


Real-World Examples

Netflix Hystrix: The canonical circuit breaker implementation. Every Netflix API call is wrapped in Hystrix. If the recommendations service is down, the UI shows cached or skeleton data instead of crashing.

Amazon: Product pages show reviews from a cache if the review service is unhealthy. The checkout still works even if recommendations are down.


Key Takeaways

  1. Circuit breakers prevent cascading failures.
  2. Three states: Closed to Open to Half-Open and repeat.
  3. Always configure a fallback. Users prefer stale data over error pages.
  4. Monitor and alert. If your circuit breaker keeps tripping, the downstream service needs fixing.

Rule of thumb: Every external service call should have a circuit breaker, a timeout, and a fallback.