PR #1041

closed

Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1356 (15.60 MiB)

by JoeProAIView on GitHub
val_bpb
1.1356
Architecture
Transformer
Optimizer
Artifact Size
15.60 MiB

Training Techniques

Quantization
int5
bits: 5
scope: all
Architecture
BigramHash
Uses bigram buckets and bigram embeddings in the model.
parameters: {"buckets":4096,"embed_dim":128}
weight tying
Embeddings are not tied.
parameters: {"tie_embeddings":false}
U-Net skip connections
Submission uses an 11-layer U-Net style architecture.
parameters: {"layers":11,"model_dim":512,"heads":8,"mlp_hidden":1536}
Test-Time Training
score-first TTT
parameters: null

Novel Contributions

  • 11-layer U-Net style model
  • Int5 quantization-aware training
  • Score-first legal test-time training
  • Independent seed validation run