PR #1769
RECORDopenRecord: SP8192 CaseOps stack retune (MLP clip 10→12) → 1.06453
by dexhunterView on GitHub
val_bpb
1.0645
Architecture
Transformer
Optimizer
—
Artifact Size
~15.98 MB
Training Techniques
Quantization
GPTQ
bits: 6
scope: MLP
Architecture
Gated Attention
Attention with learned scalar out-gates per head and quantized gating enabled.
parameters: {"init_std":0.005,"quant_gate":true}
depth recurrence
Loop4-5 recurrent depth structure.
parameters: {"loop_start":3,"loop_end":5,"num_loops":2}
weight tying
CaseOps + SP8192 stack uses the same tokenizer/model setup as the base submission; no explicit weight tying is described.
parameters: null
Test-Time Training
score-first TTT
parameters: {"phases":3,"prefix_docs":2000}
Sequence Length
sequence_length
train_length: null
eval_length: 2048
Regularization
logit softcap
parameters: {"value":30}
weight decay
parameters: null
Novel Contributions
- Retuned MLP GPTQ outlier clipping from 10.0 to 12.0
- Preserved MLP tail mass during int6 calibration for 4x-width MLPs
- Achieved 5-seed mean val_bpb of 1.06453
- Maintained compliance with 16 MB artifact cap and 600s train/eval budgets
- Reported 7-seed disclosure with 5 lowest-BPB seeds used for the official score