PR #1624

open

SAFE_SUBMISSION: run036-safe016 (1.05850 BPB)

by joshkmartinezView on GitHub
val_bpb
1.0585
Architecture
Transformer
Optimizer
AdamW
Artifact Size
15,457,982 to 15,504,058 bytes

Training Techniques

Quantization
GPTQ
bits: 6
scope: all
Architecture
weight tying
Tied embeddings / embedding tying implied by the canonical method list and submission context.
parameters: null
sliding window eval
Sliding window evaluation is enabled.
parameters: null
Test-Time Training
full TTT
parameters: {"learning_rate":0.00045,"epochs":10,"freeze_blocks":1}
Weight Averaging
EMA
parameters: {"decay":0.9965}
Sequence Length
sequence_length
train_length: 2048
eval_length: null
Compression
brotli
level: null
Optimizer
AdamW
weight_decay: null
momentum: null
other_params: null

Novel Contributions

  • Pre-quantization TTT baked into the artifact as a fixed predictor
  • Use of pulled TensorPool artifacts as the authoritative source for results
  • Explicit legality separation between submission score and frontier-only SLOT numbers
  • SP1024 + looping architecture with TTT hyperparameter tuning
  • GPTQ int6 quantization with Brotli compression under the 16MB limit