PR #2041
openRecord: SP8192 V21 + Inside-timer N-gram TTT (no Gated XSA) — val_bpb 1.05692 (3-seed mean)
by jorge-asenjoView on GitHub
val_bpb
1.0569
Architecture
Transformer
Optimizer
—
Artifact Size
15,977,032 bytes
Training Techniques
Test-Time Training
LoRA TTT
parameters: {"rank":80,"learning_rate":0.75,"num_phases":1,"prefix_docs":1000,"score_before_update":true}
Evaluation
inside-timer n-gram precompute
parameters: {"precompute_outside":0}
Architecture
LeakyReLU
LeakyReLU squared activation with slope 0.3
parameters: {"slope":0.3}
weight tying
CaseOps SP8192 tokenizer/base stack includes tied embedding-style tokenizer setup as part of the V21 lineage
parameters: null
Sequence Length
sequence_length
train_length: null
eval_length: 2560
LR Schedule
warmdown
parameters: {"warmdown_frac":0.85}
Quantization
GPTQ-lite
bits: null
scope: artifact/model
Other
other
N-gram tilt with closed-form normalized distribution over the SP8192 vocabulary
parameters: {"enabled":1}
other
AsymLogit rescale
parameters: null
other
AWQ-lite mixed-precision stack
parameters: null
Novel Contributions
- 3-seed mean full-validation result of 1.05692 BPB on track_10min_16mb
- Inside-timer n-gram precompute during evaluation
- Phased TTT with one phase and 1000 prefix docs
- Reproduction of the eval-time recipe on the PR #1967 V21 base without Gated XSA
- LQER top-1 and GPTQ reserve-time settings for artifact/compute efficiency