val_bpb
1.0877
Architecture
Transformer
Optimizer
—
Artifact Size
15,875,599 bytes
Training Techniques
Quantization
GPTQ
bits: null
scope: model weights
Compression
custom
level: null
Evaluation
sliding window eval
parameters: null
Test-Time Training
full TTT
parameters: {"learning_rate":0.005,"epochs":3}
Sequence Length
sequence_length
train_length: 8192
eval_length: null
Architecture
depth recurrence
3-layer recurrence / recurrent middle block with parallel residuals
parameters: {"layers":3}
weight tying
tied embeddings / tied embedding baseline implied by canonical method list
parameters: null
Novel Contributions
- Replaced the quantized artifact compression path with custom ANS-coded compression
- Reported improved quantized artifact size versus Brotli/LZMA after quantization
- Explored a Hyperloop-lite loop mechanism, but found it slower and not beneficial under the time budget