PR #1750

open

Add W104 SP8192 LegalTTT record candidate

by teslaecoView on GitHub
val_bpb
1.0809
Architecture
Transformer
Optimizer
Artifact Size

Training Techniques

Architecture
depth recurrence
3-layer recurrence used in the model stack
parameters: {"layers":3}
parallel residuals
Uses parallel residual connections in the stack
parameters: null
GQA
Uses QK-Gain 5.25 in the attention stack
parameters: {"qk_gain":5.25}
Test-Time Training
score-first TTT
parameters: {"learning_rate":0.005,"epochs":3}
Sequence Length
sequence_length
train_length: 8192
eval_length: null

Novel Contributions

  • W104 SP8192 LegalTTT record candidate
  • 3-seed replay with mean val_bpb 1.08089556
  • SP8192 + 3-layer recurrence + parallel residuals + QK-Gain 5.25 + legal score-first TTT stack
  • Faithful source-visible replay configuration
  • No V7, V8, or V9 auxiliary data