PR #1624

open

SAFE_SUBMISSION: run036-safe016 (1.05850 BPB)

by joshkmartinezView on GitHub

val_bpb

1.0585

Architecture

Transformer

Optimizer

AdamW

Artifact Size

15,457,982 to 15,504,058 bytes

Training Techniques

Quantization

GPTQ

bits: 6

scope: all

Architecture

weight tying

Tied embeddings / embedding tying implied by the canonical method list and submission context.

parameters: null

sliding window eval

Sliding window evaluation is enabled.

parameters: null

Test-Time Training

full TTT

parameters: {"learning_rate":0.00045,"epochs":10,"freeze_blocks":1}

Weight Averaging

EMA

parameters: {"decay":0.9965}

Sequence Length

sequence_length

train_length: 2048

eval_length: null

Compression

brotli

level: null

Optimizer

AdamW

weight_decay: null

momentum: null

other_params: null

Pre-quantization TTT baked into the artifact as a fixed predictor
Use of pulled TensorPool artifacts as the authoritative source for results
Explicit legality separation between submission score and frontier-only SLOT numbers
SP1024 + looping architecture with TTT hyperparameter tuning
GPTQ int6 quantization with Brotli compression under the 16MB limit