PR #1795
openRecord: SP4096 + byte-level PPM adaptive-λ mixture — val_bpb 0.95165 (full-val, 3-seed)
by OE-GODView on GitHub
val_bpb
0.9516
Architecture
Transformer
Optimizer
—
Artifact Size
15.93-15.96 MB
Training Techniques
Quantization
GPTQ
bits: 6
scope: all
Architecture
depth recurrence
Depth-recurrent Transformer stack inherited from the SP4096 record.
parameters: {"layers":11}
LeakyReLU
Uses LeakyReLU squared MLP activation in the inherited stack.
parameters: null
Weight Averaging
EMA
parameters: null
Evaluation
sliding window eval
parameters: {"full_val":true}
Test-Time Training
score-first TTT
parameters: {"granularity":"byte","model":"PPM-D order 5","adaptive_mixture":true}
Other
other
Byte-level adaptive-λ mixture of NN byte probabilities and online PPM-D byte probabilities.
parameters: {"lambda_high":0.9,"lambda_low":0.05,"threshold":0.9}
Novel Contributions
- Byte-level PPM adaptive-λ mixture applied at evaluation time on full validation data
- Full-val 3-seed record with val_bpb 0.95165
- Score-before-update online byte-level PPM-D mixture framed as legal test-time training
- NN-only stack unchanged from the prior SP4096 record while improving the reported score via mixture evaluation