Locking the first three raw atoms settled the seed list: raw_trade_ofi, raw_microprice_dev, raw_ofi_l1. The question this pass asks is whether the trade event can be split into better raw trade sensors than signed size alone.
The question
Can trade events be decomposed into raw trade-aggression atoms that carry information not already captured by raw_trade_ofi?
Hypothesis
raw_trade_ofi is the strongest known raw atom, but it might mix several forms of aggression into one signed-size number. A trade that consumes a large share of the visible queue, trades through the touch, or sweeps multiple book levels could carry information the signed size alone doesn’t.
What’s being tested
Four new candidate atoms, all signed:
| atom | what it measures |
|---|---|
raw_trade_size_norm_l1 | signed trade size ÷ opposite top queue size |
raw_trade_through_ticks | signed ticks of price penetration through the previous best bid/ask |
raw_trade_sweep_depth | signed count of previous book levels the trade crossed |
raw_trade_consumed_l1_frac | capped signed fraction of the opposite top queue the trade consumed |
Controls (the existing locked set):
raw_trade_ofi
raw_microprice_dev
raw_ofi_l1
raw_near_minus_deep_ofi
Sample contract
| field | value |
|---|---|
| Instrument id | 42001149 |
| Tick size | 0.25 |
| Files | First 10 sorted DBN files from Z:\MBP-10-NEW\data |
| Sample | 1,000,000 events across 10 day buckets |
| Horizons | h = 1..200 events |
What separates a LOCK from a DROP here
For this study, the decisive bar is uniqueness against raw_trade_ofi.
If the new atom’s residual rank IC after controlling for the existing locked set is near zero, then it isn’t a new sensor — it’s a different encoding of the same trade-flow axis. Only an atom that survives the residual test gets considered for LOCK.
What the data said
Scorecard:
| atom | peak IC | peak h | IC(20) | IC(50) | IC(200) | corr to controls | residual IC | days won | verdict |
|---|---|---|---|---|---|---|---|---|---|
raw_trade_ofi | 0.285 | 1 | 0.161 | 0.112 | 0.061 | 0.022 | 0.289 | 10/10 | LOCK |
raw_trade_consumed_l1_frac | 0.288 | 1 | 0.161 | 0.112 | 0.061 | 1.000 | 0.120 | 10/10 | DROP |
raw_trade_size_norm_l1 | 0.287 | 1 | 0.161 | 0.112 | 0.061 | 1.000 | 0.115 | 10/10 | DROP |
raw_trade_sweep_depth | 0.285 | 1 | 0.161 | 0.112 | 0.061 | 1.000 | −0.014 | 10/10 | DROP |
raw_trade_through_ticks | 0.141 | 1 | 0.098 | 0.070 | 0.038 | 0.437 | 0.018 | 10/10 | WATCH |
Residual IC by horizon — what’s left after controlling for the locked set:
| atom | h=1 | h=5 | h=20 | h=50 | h=100 | h=200 |
|---|---|---|---|---|---|---|
raw_trade_size_norm_l1 | 0.115 | 0.030 | 0.010 | 0.005 | 0.001 | −0.000 |
raw_trade_consumed_l1_frac | 0.120 | 0.027 | 0.006 | 0.002 | −0.000 | −0.001 |
raw_trade_sweep_depth | −0.014 | −0.021 | −0.015 | −0.009 | −0.007 | −0.004 |
raw_trade_through_ticks | 0.018 | 0.032 | 0.031 | 0.023 | 0.016 | 0.013 |
Trade-event coverage (how often the atom is nonzero):
| atom | nonzero rate |
|---|---|
raw_trade_ofi | 5.62% |
raw_trade_size_norm_l1 | 5.62% |
raw_trade_sweep_depth | 5.62% |
raw_trade_consumed_l1_frac | 5.62% |
raw_trade_through_ticks | 1.06% |
What this means in plain language
Three of the four “new” atoms are duplicates. raw_trade_size_norm_l1, raw_trade_consumed_l1_frac, and raw_trade_sweep_depth all have IC curves nearly identical to raw_trade_ofi, and their Spearman correlation to raw_trade_ofi rounds to 1.000. Their residual IC after controls collapses to near zero past h = 1. In this corpus, the rank ordering of trade events is dominated by the signed trade itself — normalising by queue size or counting levels swept doesn’t create a new axis. Different math, same measurement.
raw_trade_sweep_depth is worse than a duplicate. Its residual IC is negative at every checked horizon. After controlling for raw_trade_ofi, what’s left of sweep-depth predicts the wrong direction. That’s a duplicate plus noise.
raw_trade_through_ticks is the interesting one. It only fires on ~1% of all events — only when a trade actually penetrates the touch. Its raw IC of 0.141 is much lower than raw_trade_ofi, but its correlation to raw_trade_ofi is only 0.44 (vs ~1.00 for the others), and its residual IC stays positive from h = 1 through h = 200. The residual is even stronger at h = 5 (0.032) than at h = 1 (0.018) — through-touch aggression resolves a few events out, not instantly. It doesn’t beat raw_trade_ofi, but it isn’t raw_trade_ofi either.
Verdicts
| atom | verdict | reason |
|---|---|---|
raw_trade_ofi | LOCK | Still the locked anchor of the trade-flow axis. |
raw_trade_size_norm_l1 | DROP | Spearman 1.00 to raw_trade_ofi. Duplicate encoding. |
raw_trade_consumed_l1_frac | DROP | Spearman 1.00 to raw_trade_ofi. Duplicate encoding. |
raw_trade_sweep_depth | DROP | Duplicate, plus negative residual IC. |
raw_trade_through_ticks | WATCH | Sparse (1% of events), partly independent, stable. Promote to Stage 2. |
What this changes
The Stage 1 trade-aggression slot stays as just raw_trade_ofi. There is no second trade-aggression atom worth locking — yet.
raw_trade_through_ticks moves to Stage 2 as a sparse aggression state. The transforms most worth testing first:
agreement_state(raw_trade_through_ticks, raw_microprice_dev)— through-touch aggression when the touch fair value agrees with it.agreement_state(raw_trade_through_ticks, raw_ofi_l1)— through-touch aggression plus book flow.signed_persistence(raw_trade_through_ticks, L = 3..10)— sign streaks of through-touch trades.EWMA(raw_trade_through_ticks, L = 3..20)— pressure persistence on the sparse axis.through_ticks_when_spread_tight— conditional state.through_ticks_when_microprice_disagrees— the contrarian read.
The decisive test for Stage 2 is whether through-touch aggression plus microprice agreement produces a stronger, less-redundant version of the existing trade_microprice_agreement_3 composite.
Reproduce
$files = (Get-ChildItem Z:\MBP-10-NEW\data -Filter *.dbn |
Sort-Object Name | Select-Object -First 10 -ExpandProperty FullName) -join ','
go run . raw `
-files $files `
-instrument-id 42001149 `
-tick-size 0.25 `
-max-events 0 `
-max-events-per-file 100000 `
-out out\trade_atom_research_20260517_10d_100k
Primary outputs:
raw_atom_scorecard.csv
raw_atom_ic_curve.csv
raw_atom_incremental_ic.csv
raw_atom_orthogonality.csv
raw_atom_health.csv
manifest.txt