PR #2149
openNon-record: SP8192 + RandProj384 tied embeddings + Pairwise-QK Muon -- Single-seed negative result
by YaseenHQView on GitHub
val_bpb
1.1269
Architecture
Transformer
Optimizer
Muon
Artifact Size
15,438,770 bytes
Training Techniques
Architecture
weight tying
Random-projection tied embeddings (RandProj384) used to tie embeddings via a lower-dimensional projection.
parameters: {"projection_dim":384}
attention modifications
Pairwise-head Muon orthogonalization applied to Q/K projections (PairMuonQK).
parameters: null
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"pairwise_head_qk_orthogonalization":true}
Weight Averaging
EMA
parameters: null
Novel Contributions
- Random-projection tied embeddings (RandProj384)
- Pairwise-head Muon orthogonalization for Q/K (PairMuonQK)
- Negative-result submission showing these ideas regressed validation quality
- Legal sub-16MB artifact produced within the training cap