PR #1476
open[Record Submission] SP8192 + QK5 + Legal TTT — val_bpb 1.0842 | 15.99MB
by aryan-csView on GitHub
val_bpb
1.0842
Architecture
Transformer
Optimizer
—
Artifact Size
15.99MB
Training Techniques
Quantization
int6
bits: 6
scope: artifact
Architecture
SP8192
Tokenizer configuration using SP8192.
parameters: null
Other
other
QK_GAIN_INIT set to 5.
parameters: {"QK_GAIN_INIT":5}
Test-Time Training
score-first TTT
parameters: null
Weight Averaging
EMA
parameters: null
Novel Contributions
- SP8192 tokenizer
- QK_GAIN_INIT=5
- legal score-first TTT
- compact quantized artifact under the 16 MB submission limit
- record submission package for track_10min_16mb