val_bpb
1.1092
Architecture
Transformer
Optimizer
—
Artifact Size
16,065,507 bytes
Training Techniques
Architecture
depth recurrence
Late step-based loop onset with pass-gated recurrence in a looped band over layers 3..5.
parameters: {"loop_onset_step":2600,"loop_layers":[3,5],"gate":true}
Test-Time Training
full TTT
parameters: {"easy_chunk_ratio":0.998,"easy_chunk_epochs":1,"outlier_drop_fraction":0.03,"score_weight_power":0.5}
Quantization
int8
bits: 8
scope: control packing
Regularization
label smoothing
parameters: {"bpb_weighted_loss":true,"weight_power":0.5,"weight_clip":2}
Sequence Length
sequence_length
train_length: 8192
eval_length: 8192
Novel Contributions
- Tempered BPB-weighted training loss using tokenizer byte counts
- Late loop onset at step 2600 with pass-gated recurrence
- Easy-chunk legal TTT
- Control-int8 packing
- SP8192 tokenizer/dataset setup