PR #1835
openRecord: SP8192 + PPM-D byte mixture — 1.00136 BPB (3-seed mean)
by anmarhindiView on GitHub
val_bpb
1.0014
Architecture
Transformer
Optimizer
—
Artifact Size
15,993,020 bytes
Training Techniques
Architecture
depth recurrence
Uses 3-layer recurrence / recurrent stack as part of the model setup.
parameters: {"layers":3}
Evaluation
sliding window eval
parameters: {"stride":64}
Test-Time Training
score-first TTT
parameters: null
Quantization
int6
bits: 6
scope: model weights
Compression
lzma
level: null
Other
other
Binary-λ-gated PPM-D byte-level mixture applied at eval time, mixing PPM probabilities with NN byte probabilities in probability space.
parameters: {"ppm_order":5,"ppm_lambda_hi":0.9,"ppm_lambda_lo":0.05,"ppm_conf_threshold":0.9,"ppm_subset_tokens":3000000}
Novel Contributions
- Binary-λ-gated PPM-D byte-level mixture at evaluation time
- Probability-space mixing of PPM and NN per-byte probabilities
- Score-first online PPM updates using already-scored tokens only
- Order-5 byte-level PPM with confidence-based gating
- Wrapped launcher using lzma + base85 to fit submission size