PR #1894

open

[Non-record] SP8192 + MuonEq-R + Loop@0.42 + RECUR_AB + QAT-lite + Compact Artifact - Val 1.09960971

by ChideraIbe123View on GitHub
val_bpb
1.0996
Architecture
Transformer
Optimizer
Muon
Artifact Size
15,974,435 bytes

Training Techniques

Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"MuonEq-R":true}
Architecture
depth recurrence
Wallclock-aware recurrent looping activated during training/evaluation.
parameters: {"enable_looping_at":0.42}
other
Learned recurrent alpha/beta blending for recurrence.
parameters: {"method":"RECUR_AB"}
Quantization
late QAT
bits: 6
scope: q/k projections
Compression
lzma
level: null
brotli
level: null
Test-Time Training
score-first TTT
parameters: null
Sequence Length
sequence_length
train_length: 8192
eval_length: 8192

Novel Contributions

  • MuonEq-R integration
  • wallclock-aware recurrence scheduling with looping at 0.42
  • RECUR_AB learned recurrent alpha/beta blending
  • targeted late QAT-lite on q/k projections
  • compact artifact engineering with compressed control tensors and GPTQ scale storage
  • LZMA code wrapper