PR #1519

open

BPB-weighted training loss: align training objective with eval metric

by elliottdehnView on GitHub

val_bpb

1.1146

Architecture

Transformer

Optimizer

—

Artifact Size

—

Training Techniques

Regularization

loss weighting

parameters: {"weight_by":"UTF-8 bytes per token","clamp_min":1}

Novel Contributions

Weights token cross-entropy loss by UTF-8 byte length to better match the BPB evaluation metric
Reuses existing BPB byte lookup table during training with no extra parameters
Reports preliminary improvement over baseline on 2x RTX 5090 runs