PR #1702

open

Add non-record SP8192 tempered BPB-weighted loss submission

val_bpb
1.1092
Architecture
Transformer
Optimizer
Artifact Size
16,065,507 bytes

Training Techniques

Architecture
depth recurrence
Late step-based loop onset with pass-gated recurrence in a looped band over layers 3..5.
parameters: {"loop_onset_step":2600,"loop_layers":[3,5],"gate":true}
Test-Time Training
full TTT
parameters: {"easy_chunk_ratio":0.998,"easy_chunk_epochs":1,"outlier_drop_fraction":0.03,"score_weight_power":0.5}
Quantization
int8
bits: 8
scope: control packing
Regularization
label smoothing
parameters: {"bpb_weighted_loss":true,"weight_power":0.5,"weight_clip":2}
Sequence Length
sequence_length
train_length: 8192
eval_length: 8192

Novel Contributions

  • Tempered BPB-weighted training loss using tokenizer byte counts
  • Late loop onset at step 2600 with pass-gated recurrence
  • Easy-chunk legal TTT
  • Control-int8 packing
  • SP8192 tokenizer/dataset setup