val_bpb
1.0624
Architecture
Transformer
Optimizer
—
Artifact Size
15.82 MB
Training Techniques
Quantization
mixed int6/int7
bits: 6
scope: weights and embeddings
Architecture
weight tying
Not explicitly stated; no weight tying mentioned.
parameters: null
depth recurrence
Looping architecture with recurrent block execution.
parameters: {"loop_start":3,"loop_end":5,"enable_looping_at":0.45}
Test-Time Training
LoRA TTT
parameters: {"prefix_docs":2500,"phases":3,"chunk":48}
Compression
pergroup
level: null
Sequence Length
sequence_length
train_length: null
eval_length: null
Novel Contributions
- SP10240 CaseOps 10k-vocab variant of PR #1855
- Same phased-TTT / compression machinery as the SP8192 lane
- Shrunk body to MLP3.75 to fit under the 16 MB cap after increasing vocab size
- Non-record comparison submission for the Mockingbird lane