Latency isn't what you think it is

If you ask a retail quant “what’s your latency?” they’ll usually quote a mean: 2ms, 15ms, 80ms, whatever. That number is almost useless.

The exchange doesn’t schedule your orders around your mean latency. Your strategy doesn’t lose money at the mean. Your P&L is a function of the distribution — and more specifically, of the tail.

The three latencies

Your “round-trip latency” is actually a composition of at least three separate distributions:

Decision latency — time from receiving a market event to deciding to act.
Transit latency — time from your network card to the exchange matching engine and back.
Exchange processing latency — time the matching engine itself takes once your message arrives.

Each one has its own mean, variance, and tail shape. They add up, but they don’t add up cleanly. Spikes in one tend to correlate with spikes in another (e.g. busy market = more messages to process and more network contention).

Mean latency hides all of this. It tells you nothing about the world you actually trade in.

The tail is the strategy

Consider a short-horizon signal with a 10ms half-life. Your expected return is a function of how often you act before the signal has decayed past the point where it beats costs. If your 50th-percentile latency is 4ms but your 99th-percentile is 80ms, then:

Half the time you’re acting on a mostly-fresh signal.
Once in every hundred events you’re acting on signal noise.

That 1% is often where your P&L goes to die. Losses from late fills don’t just cancel out gains — they cost you the spread plus the adverse move plus the fee. A few of those in a bad minute can outweigh a whole morning of good fills.

Jitter eats you quietly

Jitter — the variance of latency — is worse than mean latency for most strategies. Consistent slow is survivable: you just hold the signal longer, model more decay, and compete elsewhere. Inconsistent fast is deadly: you build your strategy assuming the mean and get executed at the tail just often enough to kill the expectancy.

The math isn’t subtle: with lognormal-ish latency distributions, the tail can easily be 20x the median. A naive P&L simulator using the median overstates Sharpe by a factor that depends on how tail-sensitive the signal is.

What “low latency” actually buys you

Real low-latency engineering (co-located hardware, kernel bypass, FPGA) is about compressing the distribution, not just shifting the mean. The goal is:

Lower median, yes.
Much lower p99 and p99.9.
Tighter coupling between decision time and network time so spikes don’t compound.

For most retail-scale quant work, you’re not going to compete on nanoseconds. But you can still compress your distribution dramatically by:

Writing code whose worst case is bounded. No allocations on the hot path, no GC, no blocking calls, no dynamic dispatch.
Using fixed-size data structures for the book.
Avoiding shared state that needs locking across threads.
Bounding the time spent in each event handler with a watchdog.

A Python strategy whose 99th percentile is 200ms is a fundamentally different animal from a Rust strategy whose 99th percentile is 800µs. Not because of language religion — because of what the tail of the distribution looks like.

Measuring it properly

A few rules:

Timestamp at the source. Don’t trust the exchange’s receive timestamp; capture the moment the event hit your NIC.
Use a steady monotonic clock. time.time() is not it. CLOCK_MONOTONIC_RAW on Linux, mach_absolute_time on macOS, QueryPerformanceCounter on Windows.
Record percentiles, not means. At minimum: p50, p90, p99, p99.9. Record them per hour, per session, per symbol.
Watch jitter in isolation. p99 – p50 is a reasonable proxy.
Correlate with market state. Tail spikes at market open or during news are the ones that matter most.

If your observability stack can’t answer “what’s our p99 decision latency during the first 5 minutes of the NY session for BTCUSDT” in thirty seconds, that’s the project to do first.

Backtesting with realistic latency

Most backtests assume instant execution. A few more-careful ones add a constant delay. Both are wrong.

A backtest worth defending draws latencies from the actual measured distribution, not a constant. If you’ve logged real latency samples from production (and you should be), sample from them at replay time and apply them as action delays. Your “slippage” will start to look a lot more like your live experience.

The engineering discipline

Latency work is as much culture as code. A few habits that matter:

Budgeting: every hot-path function has a time budget. Exceeding it is a bug, not a performance optimization.
Regression tests on p99: if your p99 doubles after a code change, fail the deploy. Average-case tests are insufficient.
Idle measurement: run the system with no market load and measure the floor. You can’t go below it in production.
Fast-path simplicity: if the hot path has conditionals for a rare case, the rare case is a different function, not a branch in the main one.

Closing

Latency is one of those topics where the community discussion outruns reality. Most of the “we need microseconds” talk you see online is from people whose strategies have signal half-lives measured in seconds. The honest question is:

What’s the tail of my latency distribution, what’s the decay curve of my signal, and at what percentile does one overtake the other?

When you can answer that, you know exactly how much latency engineering to do. Not less, not more.

For more on this, the MBO Research Framework page covers how to incorporate realistic execution into the research loop from day one.