val_bpb
1.0759
Architecture
Transformer
Optimizer
—
Artifact Size
15,961,508 bytes
Training Techniques
Test-Time Training
score-first TTT
parameters: {"learning_rate":0.005,"epochs":4,"chunk_tokens":32768}
Quantization
GPTQ
bits: 5
scope: final block attention K/V
Compression
brotli
level: 11
Sequence Length
sequence_length
train_length: 32768
eval_length: null
Architecture
depth recurrence
3-layer recurrence used in the SP8192 submission
parameters: {"layers":3}
Evaluation
sliding window eval
parameters: null
Novel Contributions
- Size-clearing fallback candidate derived from PR #1812
- Official SP8192 FineWeb data/tokenizer with legal score-first TTT
- Brotli compression with lgwin=24
- TTT chunk size set to 32768
- Final-block attention K/V int5 GPTQ quantization
- Reports a completed seed 42 run after seed 314 was interrupted