PR #157

closed

Add TTT-LoRA 512d submission (val_bpb=1.1957)

val_bpb

1.1957

Architecture

Transformer

Optimizer

—

Artifact Size

15,880,385 bytes

Training Techniques

Architecture

KV head count

Uses a baseline-sized Transformer with 8 attention heads and 4 KV heads.

parameters: {"layers":9,"model_dim":512,"num_heads":8,"num_kv_heads":4}

LR Schedule

warmdown

parameters: {"warmdown_steps":3000}

Quantization

int8

bits: 8

scope: all

Compression

zlib

level: null

Test-Time Training

LoRA TTT

parameters: null

Uses LoRA-based test-time training to improve compression performance.
Shows that the TTT-LoRA evaluation path outperforms the plain int8 roundtrip.
Fits an int8 + zlib artifact within the 16 MB submission limit.
Uses a 512-dimensional baseline Transformer with 8 heads and 4 KV heads under a 10-minute training budget.