thoroughly
back to writing
2026 · 04 · 16 · 2 min read · paymentsdistributed-systems

The Ghost in the Switch: Why payments engineers have been running sagas for 40 years

How legacy payment networks handle distributed consensus, MTI 0420 reversals, and the reality of Store-and-Forward queues.

Your terminal said declined. Your bank said approved. Both were right.

Your bank — the acquirer — talks to a payment switch. The switch talks to your card’s issuing bank. Each hop has a timer. When the issuer takes too long, the switch’s timer fires first. It declines back to the acquirer. The terminal moves on.

But the switch doesn’t.

This is the part most people don’t know exists — the ambiguity window. The switch timed out, but it doesn’t know what the issuer decided. The response could be lost. The “yes” might still be travelling. The request might never have landed.

And if the issuer eventually approves a transaction the customer was already told was declined, money moved without anyone’s permission.

Not a refund. A correction.

A reversal isn’t a new transaction. It’s a targeted instruction: undo that specific approval. In ISO8583, that’s MTI 0420.

Distributed systems engineers call this a compensating transaction — the Saga pattern. Payments built it before microservices needed it. But sending the message isn’t the hard part. Making sure it lands is.

Timeout flow: Acquirer sends request (1) to Switch, Switch forwards to Issuer (2), Switch timeout fires and declines back (3), Issuer's late response arrives (4), Switch sends 0420 Reversal (5) — retried as 0421 via SAF if no 0430 ack received.

How the switch refuses to give up

The first reversal goes out direct. If the issuer acknowledges (0430), done.

If it times out, the switch falls back to Store-and-Forward (SAF) — a per-participant FIFO queue holding every undelivered reversal. Retries are sent as 0421 (Reversal Advice Repeat): Reconcile it.

The retry budget is configurable — typically 5. After that, the reversal is dropped and filed as a dispute case. The system doesn’t pretend it always wins.

So far, clean. Production isn’t.

The poison pill

A bad data element from one participant triggered a null pointer exception in our reversal handler. Each record waited the full 60-second timeout, retried 5 times. Five minutes per record — survivable, until you remember the queue is FIFO. One stuck record blocks every reversal behind it. The queue stopped being a queue and became a wall.

Queue saturation. A spike once piled up faster than we could drain. By the next day, the issuer had aged out the original transactions — the reversals became invalid.

Eventual consistency only works if “eventual” doesn’t mean “tomorrow.”

The bottom line

If you’ve had a “pending” charge sit for days after a transaction was clearly declined — you’ve personally met a reversal that didn’t make it. Reversals are how payments reconcile two correct decisions made in different timelines. They’re why you don’t get double-charged when a terminal glitches, and why your money doesn’t quietly disappear when the network blinks.

Modern distributed systems rediscovered this pattern, gave it a Greek-letter name, and put it on conference slides. Payments engineers have been running compensating transactions across hostile networks since before most of us wrote our first line of code.

Payments aren’t instant.

They’re just very, very fast at cleaning up their own mistakes.