PR #1931

open

Add SP8192 legal TTT size-cleared seed42 fallback

by jaydenpiaoView on GitHub
val_bpb
1.0759
Architecture
Transformer
Optimizer
Artifact Size
15,961,508 bytes

Training Techniques

Test-Time Training
score-first TTT
parameters: {"learning_rate":0.005,"epochs":4,"chunk_tokens":32768}
Quantization
GPTQ
bits: 5
scope: final block attention K/V
Compression
brotli
level: 11
Sequence Length
sequence_length
train_length: 32768
eval_length: null
Architecture
depth recurrence
3-layer recurrence used in the SP8192 submission
parameters: {"layers":3}
Evaluation
sliding window eval
parameters: null

Novel Contributions

  • Size-clearing fallback candidate derived from PR #1812
  • Official SP8192 FineWeb data/tokenizer with legal score-first TTT
  • Brotli compression with lgwin=24
  • TTT chunk size set to 32768
  • Final-block attention K/V int5 GPTQ quantization
  • Reports a completed seed 42 run after seed 314 was interrupted