PR #1040

closed

Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1336 (15.59 MiB)

by JoeProAIView on GitHub
val_bpb
1.1336
Architecture
Transformer
Optimizer
Artifact Size
15.59 MiB

Training Techniques

Quantization
QAT
bits: 5
scope: all
Architecture
BigramHash
Bigram buckets and bigram embedding used in the model
parameters: {"buckets":4096,"embed_dim":128}
weight tying
Embeddings are not tied
parameters: {"tied":false}
Test-Time Training
score-first TTT
parameters: null

Novel Contributions

  • 11-layer U-Net style model with Int5 QAT
  • Score-first legal test-time training
  • Independent seed validation run
  • Artifact kept under the 16 MiB limit