PR #1706

open

Non-record: SP8192 + QK5.4 + Legal Score-First TTT(4) — val_bpb 1.08149 (seed 1337)

by aamodbhattView on GitHub
val_bpb
1.0815
Architecture
Transformer
Optimizer
Muon
Artifact Size
15,994,638 bytes

Training Techniques

Quantization
GPTQ
bits: 5
scope: model weights
Optimizer
Muon
weight_decay: 0.1
momentum: null
other_params: {"QK_GAIN_INIT":5.4}
Test-Time Training
score-first TTT
parameters: {"epochs":4,"learning_rate":0.0055}
Evaluation
sliding window eval
parameters: null
Sequence Length
sequence_length
train_length: 8192
eval_length: null
Regularization
weight decay
parameters: {"value":0.1}

Novel Contributions

  • Legal score-first TTT submission package
  • Score-first chunk ordering in the TTT path
  • 8xH100 strict compliance-capped run
  • QK 5.40 configuration with 4-epoch TTT
  • Artifact kept under the 16MB limit