PR #1519

open

BPB-weighted training loss: align training objective with eval metric

by elliottdehnView on GitHub
val_bpb
1.1146
Architecture
Transformer
Optimizer
Artifact Size

Training Techniques

Regularization
loss weighting
parameters: {"weight_by":"UTF-8 bytes per token","clamp_min":1}

Novel Contributions

  • Weights token cross-entropy loss by UTF-8 byte length to better match the BPB evaluation metric
  • Reuses existing BPB byte lookup table during training with no extra parameters
  • Reports preliminary improvement over baseline on 2x RTX 5090 runs