PR #1744

open

[submission] SP8192 + QK5 + Freeze10 Loss-Gated Legal TTT (1.08885521)

by MuhammedErinArchitectureView on GitHub
val_bpb
1.0889
Architecture
Transformer
Optimizer
AdamW
Artifact Size
15,994,383 bytes

Training Techniques

Quantization
GPTQ
bits: 6
scope: matrices; int8 embeddings
Architecture
KV head count
SP8192 family Transformer with 11 layers, 512 hidden size, 8 attention heads, and 4 KV heads.
parameters: {"layers":11,"dimensions":512,"heads":8,"kv_heads":4}
Test-Time Training
score-first TTT
parameters: {"freeze_blocks":10,"param_mode":"all","loss_gate_mode":"running_mean","loss_gate_margin":0,"final_block_only":true}
Regularization
weight decay
parameters: null

Novel Contributions

  • Legal 8xH100 / 10 minute / 16 MB submission in the SP8192 + QK5 + LegalTTT family
  • Freeze the first 10 transformer blocks during test-time training
  • Adapt only the final block during test-time training
  • Use a running-mean loss gate to skip low-value update windows
  • Demonstrates a competitive single-seed legal submission under runtime and size constraints