PR #1769

RECORDopen

Record: SP8192 CaseOps stack retune (MLP clip 10→12) → 1.06453

by dexhunterView on GitHub
val_bpb
1.0645
Architecture
Transformer
Optimizer
Artifact Size
~15.98 MB

Training Techniques

Quantization
GPTQ
bits: 6
scope: MLP
Architecture
Gated Attention
Attention with learned scalar out-gates per head and quantized gating enabled.
parameters: {"init_std":0.005,"quant_gate":true}
depth recurrence
Loop4-5 recurrent depth structure.
parameters: {"loop_start":3,"loop_end":5,"num_loops":2}
weight tying
CaseOps + SP8192 stack uses the same tokenizer/model setup as the base submission; no explicit weight tying is described.
parameters: null
Test-Time Training
score-first TTT
parameters: {"phases":3,"prefix_docs":2000}
Sequence Length
sequence_length
train_length: null
eval_length: 2048
Regularization
logit softcap
parameters: {"value":30}
weight decay
parameters: null

Novel Contributions

  • Retuned MLP GPTQ outlier clipping from 10.0 to 12.0
  • Preserved MLP tail mass during int6 calibration for 4x-width MLPs
  • Achieved 5-seed mean val_bpb of 1.06453
  • Maintained compliance with 16 MB artifact cap and 600s train/eval budgets
  • Reported 7-seed disclosure with 5 lowest-BPB seeds used for the official score