PR #2039
openRecord: SP8192 + Sliding-Window Eval + Conditional-PPM Byte Mixer Full-Val - val_bpb 1.027004
by anmarhindiView on GitHub
val_bpb
1.0270
Architecture
Transformer
Optimizer
—
Artifact Size
15,592,863 bytes
Training Techniques
Quantization
GPTQ
bits: 6
scope: all
Evaluation
sliding window eval
parameters: {"stride":64}
conditional PPM byte mixer
parameters: {"alpha":15,"beta":0.8,"order":5}
Sequence Length
sequence_length
train_length: null
eval_length: null
Weight Averaging
EMA
parameters: null
Regularization
dropout
parameters: {"stochastic_depth_max":0.02}
Novel Contributions
- Canonical first-byte marginalization from SP8192 softmax to obtain a proper byte-level distribution
- Conditional PPM byte mixer that blends NN byte probabilities with PPM-D conditionals via a sigmoid gate
- Sliding-window stride-64 evaluation over the full validation set
- Full-validation distributed gather fix so PPM state advances over all ranks in correct sequential order
- Eval-time post-processor with no new trainable parameters or artifact bytes