PR #2041

open

Record: SP8192 V21 + Inside-timer N-gram TTT (no Gated XSA) — val_bpb 1.05692 (3-seed mean)

by jorge-asenjoView on GitHub
val_bpb
1.0569
Architecture
Transformer
Optimizer
Artifact Size
15,977,032 bytes

Training Techniques

Test-Time Training
LoRA TTT
parameters: {"rank":80,"learning_rate":0.75,"num_phases":1,"prefix_docs":1000,"score_before_update":true}
Evaluation
inside-timer n-gram precompute
parameters: {"precompute_outside":0}
Architecture
LeakyReLU
LeakyReLU squared activation with slope 0.3
parameters: {"slope":0.3}
weight tying
CaseOps SP8192 tokenizer/base stack includes tied embedding-style tokenizer setup as part of the V21 lineage
parameters: null
Sequence Length
sequence_length
train_length: null
eval_length: 2560
LR Schedule
warmdown
parameters: {"warmdown_frac":0.85}
Quantization
GPTQ-lite
bits: null
scope: artifact/model
Other
other
N-gram tilt with closed-form normalized distribution over the SP8192 vocabulary
parameters: {"enabled":1}
other
AsymLogit rescale
parameters: null
other
AWQ-lite mixed-precision stack
parameters: null

Novel Contributions

  • 3-seed mean full-validation result of 1.05692 BPB on track_10min_16mb
  • Inside-timer n-gram precompute during evaluation
  • Phased TTT with one phase and 1000 prefix docs
  • Reproduction of the eval-time recipe on the PR #1967 V21 base without Gated XSA
  • LQER top-1 and GPTQ reserve-time settings for artifact/compute efficiency