This is an evidence pass. The unit of work is one question, one sample contract, and one result — pass or fail.
The question
At event t, does raw atom x_t contain forward information about
mid[t+h] − mid[t] for h = 1..200 events?
That’s it. Nothing about P&L, stops, take-profits, fees, fills, latency, or execution policy. Just the predictive content of the atom itself.
What a “raw atom” is
A raw atom is the simplest possible measurement of one mechanism in the order book at one event. Not a smoothed feature. Not a model. Not a signal. The atom either has predictive content on its own — or it doesn’t, and no amount of downstream engineering will rescue it.
The four candidates in this pass:
| atom | family | what it measures |
|---|---|---|
raw_trade_ofi | trade flow | Signed size of the trade at event t. |
raw_microprice_dev | touch state | (micro − mid) / tick — touch depth bias in ticks. |
raw_ofi_l1 | book OFI | Order-flow imbalance of the top of book between snapshots. |
raw_near_minus_deep_ofi | depth shape | Near-side OFI (L1–L3) minus far-side OFI (L8–L10). |
Sample contract
| field | value |
|---|---|
| Instrument id | 42001149 |
| Tick size | 0.25 |
| Files | First 10 sorted DBN files from Z:\MBP-10-NEW\data |
| Sample | 100,000 valid events × 10 files = 1,000,000 events |
| Day buckets | 10 |
| Horizons | h = 1..200 events |
How the verdict is decided
Five metrics. Each has to clear a bar before the atom can lock.
- Rank IC — Spearman correlation between the atom and the forward mid move. The signal-or-noise number.
- Decile monotonicity — sort the atom into deciles, look at the forward return in each bucket. A locked atom moves monotonically from bottom decile to top.
- Daily stability — share of day buckets where the IC has the correct sign.
- Horizon survival — how long the atom’s IC stays positive past its peak.
- Residual rank IC — IC after controlling for the other locked atoms. Says whether this atom is its own axis or a duplicate.
A LOCK requires: peak rank IC ≥ 0.10, ≥ 80% correct-sign day buckets, mostly monotone decile shape, nontrivial uniqueness. Less than that and the atom gets WATCH or DROP.
What the data said
Scorecard, sorted by peak rank IC:
| atom | peak IC | peak h | IC(20) | IC(50) | IC(100) | IC(200) | days won | monotone | residual IC | verdict |
|---|---|---|---|---|---|---|---|---|---|---|
raw_trade_ofi | 0.285 | 1 | 0.161 | 0.112 | 0.082 | 0.061 | 10 / 10 | 1.00 | 0.289 | LOCK |
raw_microprice_dev | 0.227 | 3 | 0.125 | 0.071 | 0.052 | 0.038 | 10 / 10 | 1.00 | 0.240 | LOCK |
raw_ofi_l1 | 0.196 | 4 | 0.137 | 0.095 | 0.067 | 0.048 | 10 / 10 | 0.80 | 0.216 | LOCK |
raw_near_minus_deep_ofi | 0.140 | 4 | 0.091 | 0.061 | 0.043 | 0.030 | 10 / 10 | 0.14 | 0.089 | DROP |
Orthogonality between locked atoms (Spearman, full sample):
| pair | spearman |
|---|---|
raw_ofi_l1 vs raw_trade_ofi | −0.00 |
raw_ofi_l1 vs raw_microprice_dev | −0.04 |
raw_trade_ofi vs raw_microprice_dev | +0.02 |
raw_microprice_dev vs raw_near_minus_deep_ofi | +0.22 |
What this means in plain language
raw_trade_ofi is the strongest immediate sensor. Peak IC of 0.285 at the very next event. It’s a signed trade — when someone hits the bid, the bid is about to move. The atom keeps meaningful information out through 200 events, with IC = 0.061 still well above noise. Days won: 10 of 10. Residual IC is essentially equal to its raw IC, meaning the other atoms don’t explain it. Locked.
raw_microprice_dev is the touch-state read. It says where the fair value is leaning inside the spread, in ticks. Peak IC of 0.227 around event 3 — slightly delayed because the touch needs an event or two to resolve into a price move. Monotone deciles, all 10 days won, clean residual. Locked.
raw_ofi_l1 is the slower book-flow read. Top-of-book order-flow imbalance. Peak IC of 0.196 around event 4, but it has the longest half-life of the three — IC still around 0.05 at event 100. Monotonicity score is 0.80 (one violation in deciles), which is below perfect but still clears the bar. Residual IC of 0.216 is the largest in the locked set. Locked.
raw_near_minus_deep_ofi looks predictive but fails the contract. Peak IC of 0.14 and 10 days won — both real. But the decile shape isn’t monotone. Monotonicity score is 0.14 — the extreme positive tail actually reverts against the middle positive bins. That sign flip in the deciles is a contract violation: you can’t pass a non-monotone raw atom forward to Stage 2 and expect a clean transform to behave. Dropped.
Verdicts
| atom | verdict | reason |
|---|---|---|
raw_trade_ofi | LOCK | Strongest peak, longest tail, fully orthogonal. |
raw_microprice_dev | LOCK | Clean monotone touch-state read at h=3. |
raw_ofi_l1 | LOCK | Slower book-flow axis, longest half-life. |
raw_near_minus_deep_ofi | DROP | Non-monotone deciles. Reformulate before promoting. |
What this changes
The locked set becomes the Stage 1 raw atom list. Stage 2 transform research starts from these three, not from indicators. The first transforms to test are the obvious ones:
EWMA(raw_trade_ofi, L = 3..20)— turning event-level pressure into memory.EWMA(raw_microprice_dev, L = 3..20)— touch-state persistence.rolling_sum(raw_ofi_l1, L = 3..20)— book-flow accumulation.signed_persistence(raw_ofi_l1, L = 3..10)— sign streaks of book flow.agreement_state(raw_trade_ofi, raw_microprice_dev)— when aggressive flow and the touch fair value agree.agreement_state(raw_ofi_l1, raw_microprice_dev)— book flow + touch state.
raw_near_minus_deep_ofi doesn’t disappear — it gets reformulated. Likely candidates: clip the extreme tails, separate near-depth from deep-depth into two atoms, or convert it into a signed-persistence shape state instead of using the raw difference directly.
Reproduce
$files = (Get-ChildItem Z:\MBP-10-NEW\data -Filter *.dbn |
Sort-Object Name | Select-Object -First 10 -ExpandProperty FullName) -join ','
go run . raw `
-files $files `
-instrument-id 42001149 `
-tick-size 0.25 `
-max-events 0 `
-max-events-per-file 100000 `
-out out\raw_atom_research_20260517_10d_100k
Primary outputs:
raw_atom_scorecard.csv
raw_atom_ic_curve.csv
raw_atom_monotonicity.csv
raw_atom_incremental_ic.csv
raw_atom_orthogonality.csv
raw_atom_health.csv
manifest.txt
Runtime: ~1m54s on the reference box.