val_bpb
1.4444
Architecture
Transformer
Optimizer
—
Artifact Size
14.68 MB
Training Techniques
Architecture
10-layer 4xMLP
Expanded the standard 9-layer architecture to 10 layers and increased the MLP multiplier from 2x to 4x.
parameters: {"layers":10,"mlp_multiplier":4}
Quantization
int8
bits: 8
scope: all weights
Compression
zlib
level: null
Evaluation
sliding window eval
parameters: {"overlapping":true}
Test-Time Training
LoRA TTT
parameters: {"batched":true}
Novel Contributions
- Expanded the baseline architecture from 9 layers to 10 layers
- Increased the MLP multiplier from 2x to 4x
- Used standard INT8 per-row post-training quantization
- Applied zlib compression to fit within the 16MB limit
- Evaluated with an overlapping sliding window
- Used batched LoRA test-time training