PR #1040
closedNon-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1336 (15.59 MiB)
by JoeProAIView on GitHub
val_bpb
1.1336
Architecture
Transformer
Optimizer
—
Artifact Size
15.59 MiB
Training Techniques
Quantization
QAT
bits: 5
scope: all
Architecture
BigramHash
Bigram buckets and bigram embedding used in the model
parameters: {"buckets":4096,"embed_dim":128}
weight tying
Embeddings are not tied
parameters: {"tied":false}
Test-Time Training
score-first TTT
parameters: null
Novel Contributions
- 11-layer U-Net style model with Int5 QAT
- Score-first legal test-time training
- Independent seed validation run
- Artifact kept under the 16 MiB limit