3 min read

Synchronous vs Asynchronous Replication — Speed or Safety?

In distributed systems, replication — keeping copies of data on multiple machines — is essential for durability and availability. The key decision is whether replication happens synchronously or asynchronously. This article explains the trade-off.

Key sources: "Designing Data-Intensive Applications" by Martin Kleppmann, PostgreSQL documentation, AWS Aurora architecture.


Why Replicate?

Replication serves three purposes:

  1. Durability: If one machine fails, data is safe on another.
  2. Availability: If one machine is down, another can serve requests.
  3. Scalability: Read replicas distribute query load across multiple machines.

The fundamental question: when the leader confirms a write, how many replicas must acknowledge it before the client gets a response?


Synchronous Replication

In synchronous replication, the leader waits for at least one replica to confirm it has written the data before acknowledging the write to the client.

Client → Write("Hello") → Leader Leader → Replica → Ack Leader → Client ← OK

Guarantee: If the leader fails immediately after acknowledging, the replica has the data. No data is lost.

Cost: The write is not confirmed until the slowest required replica responds. If a replica is slow or unavailable, writes are delayed or fail.

Single-Node Synchronous Replication

Most databases configure synchronous replication on a single replica. The leader writes to itself and one replica synchronously. Other replicas use asynchronous replication.

This provides a reasonable safety guarantee. If the leader dies, the synchronous replica has the latest data. The one-sync-replica model is the default in PostgreSQL synchronous replication.

Quorum-Based Synchronous Replication

In a quorum-based system, the leader waits for a majority of replicas to acknowledge. For a 3-node cluster, 2 confirms are required. For 5 nodes, 3 confirms.

This provides stronger durability at the cost of higher latency. Common in distributed databases like etcd, ZooKeeper, and Consul.


Asynchronous Replication

In asynchronous replication, the leader acknowledges the write immediately and sends data to replicas in the background.

Client → Write("Hello") → Leader Leader → Client ← OK (immediately) Leader → Replica (in background)

Guarantee: The write is fast but potentially fragile. If the leader fails before the replica receives the data, the write is lost.

Cost: Potential data loss on leader failure.

Application Impact

Asynchronous replication is standard in most production databases:

  • PostgreSQL: Default streaming replication is asynchronous
  • MySQL: Default replication is asynchronous
  • MongoDB: Secondary nodes replicate asynchronously by default

The reason is simple: synchronous replication adds latency proportional to the network round-trip time between leader and replica. For replicas in different data centers, this can be 50-100 ms per write.


The Trade-Off

| Aspect | Synchronous | Asynchronous | |--------|-------------|--------------| | Data safety | No data loss on leader failure | Potential data loss | | Write latency | Higher (waiting for replica) | Lower (immediate ack) | | Availability | Lower (replica failure blocks writes) | Higher (replica failure does not block) | | Complexity | Higher (managing slow replicas) | Lower | | Use case | Financial systems, critical data | Analytics, content, non-critical data |


Semi-Synchronous Replication

Many databases offer a middle ground: semi-synchronous replication. The leader waits for one replica to acknowledge, then immediately acknowledges the client. Other replicas receive data asynchronously.

text Client → Write → Leader Leader → Sync Replica 1 → Ack Leader → Client ← OK Leader → Async Replica 2 (background) Leader → Async Replica 3 (background)

This provides a reasonable safety guarantee without the latency penalty of waiting for all replicas.


Failover Scenarios

Asynchronous Replication: Data Loss

  1. Leader receives write W1 and acknowledges it to the client.
  2. Leader crashes before replicating W1 to any replica.
  3. A replica is promoted to leader.
  4. W1 is permanently lost.

The window for data loss is the time between acknowledging the write and replicating it. In practice, this is typically under 100 ms but can be longer under heavy load.

Synchronous Replication: Unavailability

  1. Leader receives write W1.
  2. The synchronous replica goes down or becomes slow.
  3. The leader cannot confirm W1 until the replica responds.
  4. All writes are blocked until the replica recovers.

This is why most systems use semi-synchronous replication — one synchronous replica provides safety without sacrificing availability if multiple replicas fail.


PostgreSQL Example

PostgreSQL offers configurable synchronous replication:

sql -- Set synchronous_standby_names in postgresql.conf synchronous_standby_names = 'FIRST 1 (replica1, replica2)'

With this configuration:

  • The leader waits for one replica (replica1 or replica2) to acknowledge
  • If the selected replica fails, the leader falls back to asynchronous mode after a timeout
  • The synchronous_commit parameter controls the behavior:

| Setting | Behavior | |---------|----------| | off | Asynchronous — no waiting | | local | Wait for local flush to disk only | | remote_write | Wait for replica to receive data | | on | Wait for replica to flush to disk | | remote_apply | Wait for replica to apply the change |


Key Takeaways

  1. Synchronous replication prevents data loss but adds latency and reduces availability.
  2. Asynchronous replication is fast but risks losing un-replicated writes on leader failure.
  3. Semi-synchronous replication (one sync replica, others async) is a common middle ground.
  4. The choice depends on whether durability or performance matters more for your use case.
  5. PostgreSQL, MySQL, and MongoDB all support configurable replication modes.

Design principle: Use synchronous replication for critical writes (financial transactions) and asynchronous for non-critical data (logs, analytics, user preferences).