PR #2158

open

Non-record: PR #2135 + MP3 marker-pair fusion

val_bpb
1.0556
Architecture
Transformer
Optimizer
Artifact Size
15,946,755 bytes

Training Techniques

Test-Time Training
score-first TTT
parameters: {"num_phases":1,"prefix_docs":2500}
Architecture
SmearGate
Inherited BOS-fixed SmearGate branch from PR #2135; hard-disabled in this submission via alias_dampening_active=false.
parameters: null
weight tying
Alias donor tokens are fused into single token rows with norm-matched warm initialization for the new alias embeddings.
parameters: null
Other
other
MP3 marker-pair fusion via stream-edit on already-tokenized CaseOps shards, fusing the 2-grams [▁,TITLE], [▁,ALLCAPS], and [▁,CAPNEXT] into alias donor tokens.
parameters: {"num_pairs":3}
Sequence Length
sequence_length
train_length: null
eval_length: 2560

Novel Contributions

  • Adds MP3 marker-pair fusion on top of PR #2135
  • Fuses three marker 2-grams into alias donor tokens without changing the tokenizer
  • Uses norm-matched warm initialization for alias rows
  • Demonstrates additive composition of the fusion lever with the PR #2135 stack
  • Documents a non-record, post-deadline reference submission