Order Flow Imbalance — the signal everyone gets halfway right

Order Flow Imbalance (OFI) is the first real microstructure signal most quants meet. It’s simple enough to fit in a tweet and useful enough to survive decades of academic rediscovery. That combination is why it’s so often implemented badly.

This page is the version I’d want someone to hand me if I was starting over.

The idea in one sentence

Track how much passive liquidity is added vs. how much is consumed at the top of book over a short window, and use the difference as a short-horizon predictor of price direction.

More precisely, the classic Cont–Kukanov–Stoikov formulation (CKS, 2014) defines a per-event contribution en at time n:

en = 1{Pn_b ≥ Pn-1_b} · qn_b - 1{Pn_b ≤ Pn-1_b} · qn-1_b
   - 1{Pn_a ≤ Pn-1_a} · qn_a + 1{Pn_a ≥ Pn-1_a} · qn-1_a

Where P_b, q_b, P_a, q_a are best bid/ask price and quantity, and the indicator terms separate true liquidity changes from pure price moves. Sum en over a window and you have OFI.

If you’ve never seen this, do not be intimidated. The formula is a bookkeeping exercise, not magic. It’s asking: “did bid-side liquidity net grow or shrink, and did ask-side liquidity net grow or shrink, on each event?”

Why it works (a little)

Aggressive buyers consume the ask. That shrinks ask-side liquidity and often tightens the spread upward. Aggressive sellers do the opposite. A persistent imbalance usually means one side’s information or inventory pressure is outrunning the other’s.

The signal isn’t a prediction of where price is going. It’s a measurement of flow pressure that tends to lead price by milliseconds to minutes, depending on the venue and asset.

The common mistakes

Before we talk about using OFI, a tour of ways people make it look better than it is.

1. Treating every event as independent

OFI computed naively treats each book update as a fresh, independent observation. It’s not. Updates cluster. A single aggressive market order can produce multiple book events. If you regress price against raw OFI, your t-stats will be nonsense because your residuals are extremely autocorrelated.

Fix: use block bootstraps or Newey–West standard errors. Or, better, evaluate in an event-driven backtest where this kind of thing can’t hide.

2. Using only top-of-book OFI

Top-of-book OFI misses depth. A book where the best bid is 10 but the next three levels are 1000 behaves very differently from a book where the best bid is 10 and there’s nothing behind it. Multi-level OFI (often called MOFI) weights contributions across more levels and captures the shape of the depth profile.

3. Ignoring queue position and cancellation

OFI counts net liquidity change, but the same net change can come from different mixtures of arrivals and cancels. “+100 contracts at the bid” because five liquidity providers joined is a different world from “+100 contracts” because someone cancelled 900 and a single resting order for 100 showed up. MBO (market-by-order) data lets you separate these. Aggregated book data does not.

4. Not normalizing

Raw OFI scales with volume. Compare morning vs. overnight, or liquid vs. illiquid names, and you’ll draw the wrong conclusions. Normalize by something regime-relevant: rolling average depth, recent volatility, or an EWMA of absolute OFI.

5. Evaluating on returns you can’t capture

OFI predicts mid-price moves on timescales shorter than your realistic execution latency. If you’re running this on retail infrastructure, a lot of the “alpha” you see in the fit is literally impossible to act on. Always evaluate against a return series that matches your execution stack.

A reference implementation sketch

Pseudocode, not tuned for speed:

def ofi_event(prev, curr):
    b_pp, b_pq = prev['bid_px'], prev['bid_qty']
    b_cp, b_cq = curr['bid_px'], curr['bid_qty']
    a_pp, a_pq = prev['ask_px'], prev['ask_qty']
    a_cp, a_cq = curr['ask_px'], curr['ask_qty']

    bid_contrib = (b_cq if b_cp > b_pp else
                   (b_cq - b_pq) if b_cp == b_pp else
                   -b_pq)

    ask_contrib = (-a_cq if a_cp < a_pp else
                   (a_pq - a_cq) if a_cp == a_pp else
                    a_pq)

    return bid_contrib + ask_contrib

Sum this over a window of N book updates, or a fixed wall-clock window, and you have a signal you can study.

How to study it honestly

Partition by regime. OFI behaves differently in thin books, during news, and around settlement. A single whole-sample regression averages over things that shouldn’t be averaged.
Use out-of-sample splits that respect time. Shuffle-splits leak information.
Measure on net-of-cost returns. Spread-crossing and fees will eat a lot of the profile.
Plot the signal and price together. Look at it. Strong quant intuition is mostly pattern recognition you acquired by staring at plots.

When OFI is not enough

OFI is a flow measurement, not a model of information. In markets where a meaningful fraction of volume is informed (crypto around liquidations, equities around news), a pure flow signal will occasionally get run over by an informed cohort that doesn’t care about the book’s local balance. That’s where VPIN, Kyle’s lambda, and trade-imbalance-adjusted models start earning their keep.

The MBO Research Framework page walks through how to combine these in a defensible way.

Takeaway

OFI is not the final word on alpha. It’s also not a toy. Implement it carefully, evaluate it honestly, and treat it as one input to a broader ensemble. That’s how it gets used in practice on the live HUD, and it’s how you should think about it in your own work.