PR #1633

open

SAFE_SUBMISSION: run036-safe016 (1.05850 BPB)

by joshkmartinezView on GitHub

val_bpb

1.0585

Architecture

Transformer

Optimizer

AdamW

Artifact Size

15,457,982 to 15,504,058 bytes

Training Techniques

Quantization

GPTQ

bits: 6

scope: all

Architecture

weight tying

Tied embeddings / embedding tying implied by canonical method mapping in the submission README.

parameters: null

depth recurrence

Looping architecture with repeated passes over selected layers.

parameters: {"layers":[4,5],"loops":2}

Gated Attention

Attention modified with QK gain / sharper attention behavior.

parameters: {"qk_gain_init":5.25}

Weight Averaging

EMA

parameters: {"decay":0.9965}

Evaluation

sliding window eval

parameters: null

Test-Time Training

full TTT

parameters: {"learning_rate":0.00045,"epochs":10,"freeze_blocks":1}

Sequence Length

sequence_length

train_length: 2048

eval_length: null

Compression

brotli

level: null

SAFE_SUBMISSION artifact staged from authoritative TensorPool pull rather than live-log heuristics
Pre-quantization TTT baked into the artifact as a fixed predictor
SP1024 tokenizer with looping architecture over layers 4-5
TTT hyperparameter tuning with 10 epochs, lower LR, and fewer frozen blocks
GPTQ int6 quantization with Brotli compression under the 16MB limit
Explicit legality separation between submission score and frontier-only SLOT numbers