PR #1894
open[Non-record] SP8192 + MuonEq-R + Loop@0.42 + RECUR_AB + QAT-lite + Compact Artifact - Val 1.09960971
by ChideraIbe123View on GitHub
val_bpb
1.0996
Architecture
Transformer
Optimizer
Muon
Artifact Size
15,974,435 bytes
Training Techniques
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"MuonEq-R":true}
Architecture
depth recurrence
Wallclock-aware recurrent looping activated during training/evaluation.
parameters: {"enable_looping_at":0.42}
other
Learned recurrent alpha/beta blending for recurrence.
parameters: {"method":"RECUR_AB"}
Quantization
late QAT
bits: 6
scope: q/k projections
Compression
lzma
level: null
brotli
level: null
Test-Time Training
score-first TTT
parameters: null
Sequence Length
sequence_length
train_length: 8192
eval_length: 8192
Novel Contributions
- MuonEq-R integration
- wallclock-aware recurrence scheduling with looping at 0.42
- RECUR_AB learned recurrent alpha/beta blending
- targeted late QAT-lite on q/k projections
- compact artifact engineering with compressed control tensors and GPTQ scale storage
- LZMA code wrapper