PR #1725

open

Add near-SOTA SP8192 LegalTTT 3-seed reproduction

by teslaecoView on GitHub
val_bpb
1.0813
Architecture
Transformer
Optimizer
Artifact Size

Training Techniques

Architecture
depth recurrence
Uses a 3-layer recurrence stack as part of the SP8192 model setup.
parameters: {"layers":3}
parallel residuals
Includes parallel residual connections in the model stack.
parameters: null
weight tying
Uses tied embeddings / embedding tying as part of the stack.
parameters: null
Test-Time Training
Legal TTT
parameters: null
Sequence Length
sequence_length
train_length: 8192
eval_length: null

Novel Contributions

  • Independent 3-seed reproduction of the SP8192 + QK-Gain 5.25 + Legal TTT stack
  • Reports per-seed validation bpb along with mean and population standard deviation
  • Positions the run as a near-SOTA reproducibility submission rather than a new record