PR #2109
openTrack 2: PR #1855 + MP3 marker-pair fusion + alias smear boundary (val_bpb 1.05917838, 3-seed avg)
by izlleyView on GitHub
val_bpb
1.0592
Architecture
Transformer
Optimizer
Muon
Artifact Size
15,907,150 bytes
Training Techniques
Quantization
GPTQ
bits: 6
scope: model weights
mixed int7/int8
bits: null
scope: embeddings and attention gate
Architecture
SmearGate
Disables previous-position smear immediately after alias tokens via an alias boundary mask.
parameters: {"alias_prev_smear_scale":0}
Gated Attention
SparseAttnGate / attention gating used in the inherited stack.
parameters: {"scale":0.5,"window":12}
XSA
Inherited XSA-based architecture from PR #1855.
parameters: {"layers":11}
weight tying
Not explicitly stated as used; null not included.
Test-Time Training
full TTT
parameters: {"phased":true,"chunk_size":48,"lora_rank":80}
Regularization
logit softcap
parameters: {"enabled":true}
Compression
lrzip+brotli
level: null
LR Schedule
warmdown
parameters: {"warmdown_frac":0.85,"warmup_steps":20}
Optimizer
Muon
weight_decay: 0.5
momentum: 0.9
other_params: {"beta2":0.99}
Initialization
resid mix
Alias rows are warm-initialized as a weighted composite of constituent marker embeddings.
Other
other
MP3 marker-pair fusion: fuses [▁, TITLE], [▁, ALLCAPS], and [▁, CAPNEXT] into alias donor tokens to reduce token count.
parameters: {"token_saving_percent":8.47}
Novel Contributions
- MP3 marker-pair fusion that merges three frequent marker bigrams into alias donor tokens
- Alias smear boundary that turns off SmearGate's previous-position contribution immediately after alias tokens
- Warm-initialization of alias rows from constituent marker embeddings
- Token-stream reduction of 8.47% while preserving the downstream word boundary signal
- Extension of the PR #1855 stack with phased TTT and alias-aware evaluation/training handling