PR #1041
closedNon-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1356 (15.60 MiB)
by JoeProAIView on GitHub
val_bpb
1.1356
Architecture
Transformer
Optimizer
—
Artifact Size
15.60 MiB
Training Techniques
Quantization
int5
bits: 5
scope: all
Architecture
BigramHash
Uses bigram buckets and bigram embeddings in the model.
parameters: {"buckets":4096,"embed_dim":128}
weight tying
Embeddings are not tied.
parameters: {"tie_embeddings":false}
U-Net skip connections
Submission uses an 11-layer U-Net style architecture.
parameters: {"layers":11,"model_dim":512,"heads":8,"mlp_hidden":1536}
Test-Time Training
score-first TTT
parameters: null
Novel Contributions
- 11-layer U-Net style model
- Int5 quantization-aware training
- Score-first legal test-time training
- Independent seed validation run