Four trade-aggression atoms — and three of them collapsed

Locking the first three raw atoms settled the seed list: raw_trade_ofi, raw_microprice_dev, raw_ofi_l1. The question this pass asks is whether the trade event can be split into better raw trade sensors than signed size alone.

The question

Can trade events be decomposed into raw trade-aggression atoms that carry information not already captured by raw_trade_ofi?

Hypothesis

raw_trade_ofi is the strongest known raw atom, but it might mix several forms of aggression into one signed-size number. A trade that consumes a large share of the visible queue, trades through the touch, or sweeps multiple book levels could carry information the signed size alone doesn’t.

What’s being tested

Four new candidate atoms, all signed:

atom	what it measures
`raw_trade_size_norm_l1`	signed trade size ÷ opposite top queue size
`raw_trade_through_ticks`	signed ticks of price penetration through the previous best bid/ask
`raw_trade_sweep_depth`	signed count of previous book levels the trade crossed
`raw_trade_consumed_l1_frac`	capped signed fraction of the opposite top queue the trade consumed

Controls (the existing locked set):

raw_trade_ofi
raw_microprice_dev
raw_ofi_l1
raw_near_minus_deep_ofi

Sample contract

field	value
Instrument id	`42001149`
Tick size	`0.25`
Files	First 10 sorted DBN files from `Z:\MBP-10-NEW\data`
Sample	1,000,000 events across 10 day buckets
Horizons	`h = 1..200` events

What separates a `LOCK` from a `DROP` here

For this study, the decisive bar is uniqueness against raw_trade_ofi.

If the new atom’s residual rank IC after controlling for the existing locked set is near zero, then it isn’t a new sensor — it’s a different encoding of the same trade-flow axis. Only an atom that survives the residual test gets considered for LOCK.

What the data said

Scorecard:

atom	peak IC	peak h	IC(20)	IC(50)	IC(200)	corr to controls	residual IC	days won	verdict
`raw_trade_ofi`	0.285	1	0.161	0.112	0.061	0.022	0.289	10/10	LOCK
`raw_trade_consumed_l1_frac`	0.288	1	0.161	0.112	0.061	1.000	0.120	10/10	DROP
`raw_trade_size_norm_l1`	0.287	1	0.161	0.112	0.061	1.000	0.115	10/10	DROP
`raw_trade_sweep_depth`	0.285	1	0.161	0.112	0.061	1.000	−0.014	10/10	DROP
`raw_trade_through_ticks`	0.141	1	0.098	0.070	0.038	0.437	0.018	10/10	WATCH

Residual IC by horizon — what’s left after controlling for the locked set:

atom	h=1	h=5	h=20	h=50	h=100	h=200
`raw_trade_size_norm_l1`	0.115	0.030	0.010	0.005	0.001	−0.000
`raw_trade_consumed_l1_frac`	0.120	0.027	0.006	0.002	−0.000	−0.001
`raw_trade_sweep_depth`	−0.014	−0.021	−0.015	−0.009	−0.007	−0.004
`raw_trade_through_ticks`	0.018	0.032	0.031	0.023	0.016	0.013

Trade-event coverage (how often the atom is nonzero):

atom	nonzero rate
`raw_trade_ofi`	5.62%
`raw_trade_size_norm_l1`	5.62%
`raw_trade_sweep_depth`	5.62%
`raw_trade_consumed_l1_frac`	5.62%
`raw_trade_through_ticks`	1.06%

What this means in plain language

Three of the four “new” atoms are duplicates. raw_trade_size_norm_l1, raw_trade_consumed_l1_frac, and raw_trade_sweep_depth all have IC curves nearly identical to raw_trade_ofi, and their Spearman correlation to raw_trade_ofi rounds to 1.000. Their residual IC after controls collapses to near zero past h = 1. In this corpus, the rank ordering of trade events is dominated by the signed trade itself — normalising by queue size or counting levels swept doesn’t create a new axis. Different math, same measurement.

raw_trade_sweep_depth is worse than a duplicate. Its residual IC is negative at every checked horizon. After controlling for raw_trade_ofi, what’s left of sweep-depth predicts the wrong direction. That’s a duplicate plus noise.

raw_trade_through_ticks is the interesting one. It only fires on ~1% of all events — only when a trade actually penetrates the touch. Its raw IC of 0.141 is much lower than raw_trade_ofi, but its correlation to raw_trade_ofi is only 0.44 (vs ~1.00 for the others), and its residual IC stays positive from h = 1 through h = 200. The residual is even stronger at h = 5 (0.032) than at h = 1 (0.018) — through-touch aggression resolves a few events out, not instantly. It doesn’t beat raw_trade_ofi, but it isn’t raw_trade_ofi either.

Verdicts

atom	verdict	reason
`raw_trade_ofi`	LOCK	Still the locked anchor of the trade-flow axis.
`raw_trade_size_norm_l1`	DROP	Spearman 1.00 to `raw_trade_ofi`. Duplicate encoding.
`raw_trade_consumed_l1_frac`	DROP	Spearman 1.00 to `raw_trade_ofi`. Duplicate encoding.
`raw_trade_sweep_depth`	DROP	Duplicate, plus negative residual IC.
`raw_trade_through_ticks`	WATCH	Sparse (1% of events), partly independent, stable. Promote to Stage 2.

What this changes

The Stage 1 trade-aggression slot stays as just raw_trade_ofi. There is no second trade-aggression atom worth locking — yet.

raw_trade_through_ticks moves to Stage 2 as a sparse aggression state. The transforms most worth testing first:

agreement_state(raw_trade_through_ticks, raw_microprice_dev) — through-touch aggression when the touch fair value agrees with it.
agreement_state(raw_trade_through_ticks, raw_ofi_l1) — through-touch aggression plus book flow.
signed_persistence(raw_trade_through_ticks, L = 3..10) — sign streaks of through-touch trades.
EWMA(raw_trade_through_ticks, L = 3..20) — pressure persistence on the sparse axis.
through_ticks_when_spread_tight — conditional state.
through_ticks_when_microprice_disagrees — the contrarian read.

The decisive test for Stage 2 is whether through-touch aggression plus microprice agreement produces a stronger, less-redundant version of the existing trade_microprice_agreement_3 composite.

Reproduce

$files = (Get-ChildItem Z:\MBP-10-NEW\data -Filter *.dbn |
  Sort-Object Name | Select-Object -First 10 -ExpandProperty FullName) -join ','

go run . raw `
  -files $files `
  -instrument-id 42001149 `
  -tick-size 0.25 `
  -max-events 0 `
  -max-events-per-file 100000 `
  -out out\trade_atom_research_20260517_10d_100k

Primary outputs:

raw_atom_scorecard.csv
raw_atom_ic_curve.csv
raw_atom_incremental_ic.csv
raw_atom_orthogonality.csv
raw_atom_health.csv
manifest.txt