Back to Learn
research May 16, 2026

Locking the first three raw atoms

A 1,000,000-event evidence pass on four candidate raw microstructure atoms — what locked, what got dropped, and what survives into Stage 2.

#raw atoms #microstructure #evidence pass #stage-1

This is an evidence pass. The unit of work is one question, one sample contract, and one result — pass or fail.

The question

At event t, does raw atom x_t contain forward information about mid[t+h] − mid[t] for h = 1..200 events?

That’s it. Nothing about P&L, stops, take-profits, fees, fills, latency, or execution policy. Just the predictive content of the atom itself.

What a “raw atom” is

A raw atom is the simplest possible measurement of one mechanism in the order book at one event. Not a smoothed feature. Not a model. Not a signal. The atom either has predictive content on its own — or it doesn’t, and no amount of downstream engineering will rescue it.

The four candidates in this pass:

atomfamilywhat it measures
raw_trade_ofitrade flowSigned size of the trade at event t.
raw_microprice_devtouch state(micro − mid) / tick — touch depth bias in ticks.
raw_ofi_l1book OFIOrder-flow imbalance of the top of book between snapshots.
raw_near_minus_deep_ofidepth shapeNear-side OFI (L1–L3) minus far-side OFI (L8–L10).

Sample contract

fieldvalue
Instrument id42001149
Tick size0.25
FilesFirst 10 sorted DBN files from Z:\MBP-10-NEW\data
Sample100,000 valid events × 10 files = 1,000,000 events
Day buckets10
Horizonsh = 1..200 events

How the verdict is decided

Five metrics. Each has to clear a bar before the atom can lock.

  • Rank IC — Spearman correlation between the atom and the forward mid move. The signal-or-noise number.
  • Decile monotonicity — sort the atom into deciles, look at the forward return in each bucket. A locked atom moves monotonically from bottom decile to top.
  • Daily stability — share of day buckets where the IC has the correct sign.
  • Horizon survival — how long the atom’s IC stays positive past its peak.
  • Residual rank IC — IC after controlling for the other locked atoms. Says whether this atom is its own axis or a duplicate.

A LOCK requires: peak rank IC ≥ 0.10, ≥ 80% correct-sign day buckets, mostly monotone decile shape, nontrivial uniqueness. Less than that and the atom gets WATCH or DROP.

What the data said

Scorecard, sorted by peak rank IC:

atompeak ICpeak hIC(20)IC(50)IC(100)IC(200)days wonmonotoneresidual ICverdict
raw_trade_ofi0.28510.1610.1120.0820.06110 / 101.000.289LOCK
raw_microprice_dev0.22730.1250.0710.0520.03810 / 101.000.240LOCK
raw_ofi_l10.19640.1370.0950.0670.04810 / 100.800.216LOCK
raw_near_minus_deep_ofi0.14040.0910.0610.0430.03010 / 100.140.089DROP

Orthogonality between locked atoms (Spearman, full sample):

pairspearman
raw_ofi_l1 vs raw_trade_ofi−0.00
raw_ofi_l1 vs raw_microprice_dev−0.04
raw_trade_ofi vs raw_microprice_dev+0.02
raw_microprice_dev vs raw_near_minus_deep_ofi+0.22

What this means in plain language

raw_trade_ofi is the strongest immediate sensor. Peak IC of 0.285 at the very next event. It’s a signed trade — when someone hits the bid, the bid is about to move. The atom keeps meaningful information out through 200 events, with IC = 0.061 still well above noise. Days won: 10 of 10. Residual IC is essentially equal to its raw IC, meaning the other atoms don’t explain it. Locked.

raw_microprice_dev is the touch-state read. It says where the fair value is leaning inside the spread, in ticks. Peak IC of 0.227 around event 3 — slightly delayed because the touch needs an event or two to resolve into a price move. Monotone deciles, all 10 days won, clean residual. Locked.

raw_ofi_l1 is the slower book-flow read. Top-of-book order-flow imbalance. Peak IC of 0.196 around event 4, but it has the longest half-life of the three — IC still around 0.05 at event 100. Monotonicity score is 0.80 (one violation in deciles), which is below perfect but still clears the bar. Residual IC of 0.216 is the largest in the locked set. Locked.

raw_near_minus_deep_ofi looks predictive but fails the contract. Peak IC of 0.14 and 10 days won — both real. But the decile shape isn’t monotone. Monotonicity score is 0.14 — the extreme positive tail actually reverts against the middle positive bins. That sign flip in the deciles is a contract violation: you can’t pass a non-monotone raw atom forward to Stage 2 and expect a clean transform to behave. Dropped.

Verdicts

atomverdictreason
raw_trade_ofiLOCKStrongest peak, longest tail, fully orthogonal.
raw_microprice_devLOCKClean monotone touch-state read at h=3.
raw_ofi_l1LOCKSlower book-flow axis, longest half-life.
raw_near_minus_deep_ofiDROPNon-monotone deciles. Reformulate before promoting.

What this changes

The locked set becomes the Stage 1 raw atom list. Stage 2 transform research starts from these three, not from indicators. The first transforms to test are the obvious ones:

  • EWMA(raw_trade_ofi, L = 3..20) — turning event-level pressure into memory.
  • EWMA(raw_microprice_dev, L = 3..20) — touch-state persistence.
  • rolling_sum(raw_ofi_l1, L = 3..20) — book-flow accumulation.
  • signed_persistence(raw_ofi_l1, L = 3..10) — sign streaks of book flow.
  • agreement_state(raw_trade_ofi, raw_microprice_dev) — when aggressive flow and the touch fair value agree.
  • agreement_state(raw_ofi_l1, raw_microprice_dev) — book flow + touch state.

raw_near_minus_deep_ofi doesn’t disappear — it gets reformulated. Likely candidates: clip the extreme tails, separate near-depth from deep-depth into two atoms, or convert it into a signed-persistence shape state instead of using the raw difference directly.

Reproduce

$files = (Get-ChildItem Z:\MBP-10-NEW\data -Filter *.dbn |
  Sort-Object Name | Select-Object -First 10 -ExpandProperty FullName) -join ','

go run . raw `
  -files $files `
  -instrument-id 42001149 `
  -tick-size 0.25 `
  -max-events 0 `
  -max-events-per-file 100000 `
  -out out\raw_atom_research_20260517_10d_100k

Primary outputs:

raw_atom_scorecard.csv
raw_atom_ic_curve.csv
raw_atom_monotonicity.csv
raw_atom_incremental_ic.csv
raw_atom_orthogonality.csv
raw_atom_health.csv
manifest.txt

Runtime: ~1m54s on the reference box.