Detecting Duplicate Transactions Before They Become a Problem
Duplicate transactions are a pervasive problem in distributed systems. A network timeout causes a retry. The user double-clicks a button. A message is delivered twice. Each scenario produces the same result: an operation happens more than once, potentially causing incorrect state.
Key sources: "Designing Data-Intensive Applications" by Martin Kleppmann, Stripe API documentation, PayPal integration guides.
How Duplicates Happen
Network Retries
The most common source of duplicates. A client sends a payment request. The server receives it and charges the card. Before the server can respond, the connection times out. The client retries. The server charges the card again.
Client Server
│ │
├── Charge $100 ──────────►── Charge card → success
│ ├── Response lost
│←── Timeout ─────────────┘
│
├── Charge $100 ──────────►── Charge card again → DUPLICATE
│ │
│◄── Response ────────────┘
Double Submissions
A user clicks "Submit" twice rapidly. Two HTTP requests arrive at the server. Both are processed identically.
Message Redelivery
Message queues guarantee at-least-once delivery. If a consumer fails to acknowledge a message, the queue redelivers it. The consumer processes it again.
Idempotency Keys
An idempotency key is a unique identifier for an operation. The server checks whether an operation with this key has already been processed. If so, it returns the previous result instead of executing the operation again.
How It Works
- The client generates a unique key (UUID) for each operation
- The client includes the key in the request header
- The server checks if it has already processed this key
- If processed: return the stored result
- If not processed: execute the operation, store the result, return it
API Example (Stripe-style)
POST /api/charges
Idempotency-Key: 6a5d8f2e-9b3c-4a7d-8e1f-2c3d4e5f6a7b
Body: { "amount": 10000, "currency": "usd", "source": "tok_visa" }
The server stores the result of this request. If the same key is used again, the server returns the stored result without charging the card again.
Server-Side Implementation
from flask import Flask, request
import uuid
app = Flask(__name__)
idempotency_store = {}
@app.route('/api/charges', methods=['POST'])
def create_charge():
idempotency_key = request.headers.get('Idempotency-Key')
if not idempotency_key:
return {"error": "Idempotency-Key header required"}, 400
# Check if already processed
if idempotency_key in idempotency_store:
return idempotency_store[idempotency_key]
# Execute the operation
result = charge_card(request.json)
# Store the result
idempotency_store[idempotency_key] = result
return result, 201
Key Considerations
- TTL (Time to Live): Idempotency keys should expire after a reasonable period (typically 24 hours). This prevents unbounded storage growth.
- Locking: Two concurrent requests with the same idempotency key must not execute the operation twice. Use a distributed lock or database-level unique constraint.
- Key generation: Clients should generate UUIDs. Never use a key that could repeat (timestamps without sufficient precision, sequential IDs without uniqueness guarantees).
Database-Level Deduplication
For database operations, a unique constraint or unique index prevents duplicate inserts:
CREATE TABLE payments (
id SERIAL PRIMARY KEY,
payment_uuid UUID NOT NULL UNIQUE, -- Idempotency key
amount DECIMAL(10,2) NOT NULL,
status VARCHAR(20) NOT NULL
);
INSERT INTO payments (payment_uuid, amount, status)
VALUES ('6a5d8f2e-9b3c-4a7d-8e1f-2c3d4e5f6a7b', 100.00, 'completed')
ON CONFLICT (payment_uuid) DO NOTHING;
The ON CONFLICT DO NOTHING clause prevents duplicate inserts. The second attempt silently succeeds without inserting a second row.
Message Queue Deduplication
Kafka and other message brokers can inherently produce duplicates during retries. Consumers must handle deduplication:
def process_message(message):
if message.id in processed_ids:
return # Already processed, skip
# Process the message
update_account(message.account_id, message.amount)
# Record as processed
processed_ids.add(message.id)
The processed IDs must be stored persistently (database, Redis) so they survive consumer restarts. The deduplication window must cover the maximum redelivery interval.
Strategies Comparison
| Strategy | Mechanism | Storage | Latency | Use Case | |----------|-----------|---------|---------|----------| | Idempotency key | Client-generated UUID | Redis/DB | Low | Payment APIs, order creation | | Unique constraint | Database constraint | Database | Very low | Inserts where duplicates are impossible | | Idempotent operation | Operation is safe to repeat | None | None | SET operations, UPSERT | | Message dedup | Consumer tracks processed IDs | Redis/DB | Low | Message queue consumers |
Key Takeaways
- Duplicate transactions are caused by network retries, double submissions, and message redelivery.
- Idempotency keys let clients make the same request multiple times safely.
- Database unique constraints prevent duplicate inserts at the storage level.
- Message queue consumers must track processed message IDs for at-least-once delivery.
- The most robust approach: idempotent operations + idempotency keys + database constraints.
- TTL-based cleanup prevents unbounded growth of stored keys.
Design principle: Every mutating API endpoint should accept an idempotency key. This simple addition prevents entire categories of bugs.