PR #2039

open

Record: SP8192 + Sliding-Window Eval + Conditional-PPM Byte Mixer Full-Val - val_bpb 1.027004

by anmarhindiView on GitHub
val_bpb
1.0270
Architecture
Transformer
Optimizer
Artifact Size
15,592,863 bytes

Training Techniques

Quantization
GPTQ
bits: 6
scope: all
Evaluation
sliding window eval
parameters: {"stride":64}
conditional PPM byte mixer
parameters: {"alpha":15,"beta":0.8,"order":5}
Sequence Length
sequence_length
train_length: null
eval_length: null
Weight Averaging
EMA
parameters: null
Regularization
dropout
parameters: {"stochastic_depth_max":0.02}

Novel Contributions

  • Canonical first-byte marginalization from SP8192 softmax to obtain a proper byte-level distribution
  • Conditional PPM byte mixer that blends NN byte probabilities with PPM-D conditionals via a sigmoid gate
  • Sliding-window stride-64 evaluation over the full validation set
  • Full-validation distributed gather fix so PPM state advances over all ranks in correct sequential order
  • Eval-time post-processor with no new trainable parameters or artifact bytes