val_bpb
1.0585
Architecture
Transformer
Optimizer
AdamW
Artifact Size
15,457,982 to 15,504,058 bytes
Training Techniques
Quantization
GPTQ
bits: 6
scope: all
Architecture
weight tying
Tied embeddings / embedding tying implied by the canonical method list and submission context.
parameters: null
sliding window eval
Sliding window evaluation is enabled.
parameters: null
Test-Time Training
full TTT
parameters: {"learning_rate":0.00045,"epochs":10,"freeze_blocks":1}
Weight Averaging
EMA
parameters: {"decay":0.9965}
Sequence Length
sequence_length
train_length: 2048
eval_length: null
Compression
brotli
level: null
Optimizer
AdamW
weight_decay: null
momentum: null
other_params: null
Novel Contributions
- Pre-quantization TTT baked into the artifact as a fixed predictor
- Use of pulled TensorPool artifacts as the authoritative source for results
- Explicit legality separation between submission score and frontier-only SLOT numbers
- SP1024 + looping architecture with TTT hyperparameter tuning
- GPTQ int6 quantization with Brotli compression under the 16MB limit