PR #1645
openAdd in-progress non-record submission for Legal TTT + Muon+ + QK Gain 4.0
by scottcui-georgianView on GitHub
val_bpb
1.1131
Architecture
Transformer
Optimizer
Muon
Artifact Size
17.35 MB
Training Techniques
Test-Time Training
score-first TTT
parameters: {"learning_rate":0.005,"epochs":3}
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"variant":"Muon+","row_wise_normalization":true}
Architecture
BigramHash
Bigram hash embedding configuration carried over from the base submission.
parameters: {"hash_size":2048,"embedding_dim":128}
XSA
XSA-all-11 architecture component carried over from the base submission.
parameters: null
Weight Averaging
EMA + SWA
parameters: null
Quantization
GPTQ
bits: 6
scope: model weights
Compression
lzma
level: null
Initialization
QK gain init
Higher QK gain initialization set to 4.0.
Regularization
weight decay
parameters: null
LR Schedule
warmdown
parameters: {"warmdown_steps":3500}
Novel Contributions
- Legal score-first test-time training
- Muon+ row-wise update normalization
- Higher QK gain initialization at 4.0
- Autoresearch-generated submission packaged with arc
- Documented in-progress non-record submission with reproducible experiment structure