val_bpb
1.0556
Architecture
Transformer
Optimizer
—
Artifact Size
15,946,755 bytes
Training Techniques
Test-Time Training
score-first TTT
parameters: {"num_phases":1,"prefix_docs":2500}
Architecture
SmearGate
Inherited BOS-fixed SmearGate branch from PR #2135; hard-disabled in this submission via alias_dampening_active=false.
parameters: null
weight tying
Alias donor tokens are fused into single token rows with norm-matched warm initialization for the new alias embeddings.
parameters: null
Other
other
MP3 marker-pair fusion via stream-edit on already-tokenized CaseOps shards, fusing the 2-grams [▁,TITLE], [▁,ALLCAPS], and [▁,CAPNEXT] into alias donor tokens.
parameters: {"num_pairs":3}
Sequence Length
sequence_length
train_length: null
eval_length: 2560
Novel Contributions
- Adds MP3 marker-pair fusion on top of PR #2135
- Fuses three marker 2-grams into alias donor tokens without changing the tokenizer
- Uses norm-matched warm initialization for alias rows
- Demonstrates additive composition of the fusion lever with the PR #2135 stack
- Documents a non-record, post-deadline reference submission