PR #1476

open

[Record Submission] SP8192 + QK5 + Legal TTT — val_bpb 1.0842 | 15.99MB

by aryan-csView on GitHub
val_bpb
1.0842
Architecture
Transformer
Optimizer
Artifact Size
15.99MB

Training Techniques

Quantization
int6
bits: 6
scope: artifact
Architecture
SP8192
Tokenizer configuration using SP8192.
parameters: null
Other
other
QK_GAIN_INIT set to 5.
parameters: {"QK_GAIN_INIT":5}
Test-Time Training
score-first TTT
parameters: null
Weight Averaging
EMA
parameters: null

Novel Contributions

  • SP8192 tokenizer
  • QK_GAIN_INIT=5
  • legal score-first TTT
  • compact quantized artifact under the 16 MB submission limit
  • record submission package for track_10min_16mb