thoroughly
back to writing
2026 · 06 · 01 · 7 min read · infratesting

Stop Testing Timeouts with sleep(): The 3-Layer Local Simulation Stack

A delayed HTTP response is a clean timeout. In production, networks die dirty. Here is how to use blackhole IPs, mitmproxy, and reverse proxies to test true resilience.

Most of us learned to test timeouts the same way: we throw a sleep(30) into a fake handler, point our client at it, and watch the client give up. That works, sort of. It proves the client’s deadline fires. It also misses every interesting failure mode that a real network produces — half-open sockets, packets vanishing into a routing black hole, a reverse proxy returning truncated bodies, a TCP RST midway through a TLS handshake.

The trouble with sleep() is that it simulates a cooperative failure. The server is healthy enough to accept the connection, parse the request, and politely stall before answering. Production networks fail uncooperatively. The TCP SYN never gets an ACK back. The TLS handshake completes, then the connection just dies. A load balancer terminates the upstream socket but holds the client connection open for a few more seconds. None of that looks like sleep().

This post walks through three layers I use locally to simulate each shape of timeout: blackhole IPs for connection-level failure, mitmproxy for request and response-level failure, and reverse proxies for infrastructure-level failure. Together they cover the realistic shape of what a network does when it’s having a bad day.

Why sleep() lies to you

sleep() is a stand-in for latency, not failure. Latency is when the server eventually answers. Failure is when the server is no longer reachable, or reachable but broken, or reachable and lying to you. Your client’s timeout code has to handle all three, and they have different behavior at the socket level.

A quick taxonomy of what can actually go wrong:

Each one requires a different test environment. sleep() only ever produces the read-timeout case, and even then only the clean version of it.

Layer 1: blackhole IPs for connection timeouts

A blackhole IP is an address that exists at the routing layer but never answers. Sending a SYN there is the cleanest possible way to test what your client does when a server is unreachable. There is no friendly TCP reset, no connection refused — just silence until your deadline fires.

The cheap version

The classic blackhole address is 10.255.255.1 on many networks: it’s in private space, often unrouted on a default LAN, and your kernel may happily try to reach it until the deadline fires. On macOS you can also try the IANA-reserved documentation block, but verify both addresses on your own network because VPNs and corporate routing can change the failure shape:

# Should hang for ~75s (the default kernel SYN retry budget) then fail
curl --connect-timeout 5 http://10.255.255.1/

# IANA TEST-NET, often unreachable on local networks
curl --connect-timeout 5 http://192.0.2.1/

If your client has a connect timeout, you’ll see it fire at --connect-timeout seconds. If it doesn’t — and a surprising number of HTTP clients don’t set one by default — the call will hang until the kernel gives up, which on Linux is around 75 seconds. That gap between “default kernel timeout” and “intended client timeout” is exactly the kind of thing this layer catches.

What you can actually learn

ScenarioWhat it testsWhy it matters
curl against 10.255.255.1Connect timeout pathMost clients have no default; this surfaces the bug.
Same, with TLSTLS handshake never startsSome libraries log TLS errors that are really connect errors.
Same, behind a connection poolPooled-connect dial timeoutPools often inherit a longer timeout than the request.

The point isn’t to be clever. The point is that the failure shape — SYN, no answer, ever — is exactly what happens when a remote host is iptables -j DROP’d, or when a security group revokes inbound TCP, or when an availability zone goes offline.

Layer 2: mitmproxy for request and response timeouts

Once the connection succeeds, the next class of failures lives between request bytes leaving your client and response bytes arriving. This is where you want mitmproxy, because it lets you intercept HTTP at the protocol layer and misbehave in specific, scriptable ways.

The pattern: run mitmproxy as an HTTP proxy on localhost:8080, point your client at it, and use an addon to inject the failure mode you care about.

A minimal stall addon

# stall.py — block every response for N seconds before forwarding
import asyncio
from mitmproxy import http

class Stall:
    def __init__(self, delay: float = 30.0):
        self.delay = delay

    async def response(self, flow: http.HTTPFlow) -> None:
        await asyncio.sleep(self.delay)

addons = [Stall(delay=30.0)]

Run it with mitmproxy -s stall.py. Now your client sees a fully successful TCP+TLS handshake, sends its request normally, and then waits. This is the read timeout case — the one sleep() was always trying to simulate, except now it lives at the right protocol layer and you can change it at runtime.

What mitmproxy is uniquely good at

Each of these maps to a real production failure I’ve seen, and none of them are reproducible with sleep().

Layer 3: reverse proxies for infrastructure timeouts

The third layer is the one most engineers skip, because it requires standing up real infrastructure. It’s also the one that catches the highest class of bugs: the failures that happen between your service and its upstream, in the proxy hop.

The setup: put nginx (or Envoy, or HAProxy) in front of your test server, give it deliberately aggressive timeouts, and see what the client sees.

A misbehaving nginx

# nginx-bad.conf — proxy timeouts shorter than backend response time
server {
    listen 8081;

    location / {
        proxy_pass http://127.0.0.1:9000;
        proxy_connect_timeout 1s;
        proxy_send_timeout 2s;
        proxy_read_timeout 2s;

        # Refuse to buffer; surface upstream stalls immediately
        proxy_buffering off;
    }
}

Point your test backend at port 9000 and have it sleep(5) before responding. The client now sees something it almost never sees in a unit test: a 504 Gateway Timeout returned by the proxy at 2 seconds, while the backend is still running and will eventually answer at 5 seconds. The backend’s metrics will say “success,” the proxy’s metrics will say “timeout,” and your client’s logs will say “502/504.” Reconciling those three views is the actual engineering problem in a production outage.

What you’re really testing

There’s a hierarchy of timeouts in any real system. A request flows through:

  1. Client deadline. Whatever your code or framework sets.
  2. Connection pool / dialer timeout. Usually distinct from the request deadline.
  3. Proxy connect timeout. The proxy’s view of upstream reachability.
  4. Proxy read/write timeout. The proxy’s patience with a stalled backend.
  5. Backend handler deadline. Usually a context deadline propagated from somewhere upstream.

If any of these are misordered — for example, a client deadline longer than the proxy’s read timeout — the symptom is always weirder than it looks. The reverse-proxy layer is the only one that lets you actually see the misalignment, because it’s the only layer that has a different clock than your code does.


Putting it together

The layers compose. You can blackhole the upstream IP that nginx proxies to, and watch nginx’s own connect timeout fire — that’s a different failure than nginx returning 504 on a slow backend, and a different one again from nginx itself going away. You can run mitmproxy as the upstream behind nginx and inject partial responses, then see whether your client retries the proxy failure or the origin failure. You can chain all three for the kind of compound failure that, in production, only shows up at 3am during a partial provider outage.

A pragmatic rule of thumb:

None of these layers replaces the others. They each simulate a different shape of failure, and a real client has to survive all of them. The next time you reach for sleep(30) in a test, ask yourself which of the five timeout categories above you’re actually exercising — and which four you’re skipping.

The network doesn’t fail politely. Your tests shouldn’t either.