PR #1519
openBPB-weighted training loss: align training objective with eval metric
by elliottdehnView on GitHub
val_bpb
1.1146
Architecture
Transformer
Optimizer
—
Artifact Size
—
Training Techniques
Regularization
loss weighting
parameters: {"weight_by":"UTF-8 bytes per token","clamp_min":1}
Novel Contributions
- Weights token cross-entropy loss by UTF-8 byte length to better match the BPB evaluation metric
- Reuses existing BPB byte lookup table during training with no extra parameters
- Reports preliminary improvement over baseline on 2x RTX 5090 runs