PR #1795

open

Record: SP4096 + byte-level PPM adaptive-λ mixture — val_bpb 0.95165 (full-val, 3-seed)

val_bpb
0.9516
Architecture
Transformer
Optimizer
Artifact Size
15.93-15.96 MB

Training Techniques

Quantization
GPTQ
bits: 6
scope: all
Architecture
depth recurrence
Depth-recurrent Transformer stack inherited from the SP4096 record.
parameters: {"layers":11}
LeakyReLU
Uses LeakyReLU squared MLP activation in the inherited stack.
parameters: null
Weight Averaging
EMA
parameters: null
Evaluation
sliding window eval
parameters: {"full_val":true}
Test-Time Training
score-first TTT
parameters: {"granularity":"byte","model":"PPM-D order 5","adaptive_mixture":true}
Other
other
Byte-level adaptive-λ mixture of NN byte probabilities and online PPM-D byte probabilities.
parameters: {"lambda_high":0.9,"lambda_low":0.05,"threshold":0.9}

Novel Contributions

  • Byte-level PPM adaptive-λ mixture applied at evaluation time on full validation data
  • Full-val 3-seed record with val_bpb 0.95165
  • Score-before-update online byte-level PPM-D mixture framed as legal test-time training
  • NN-only stack unchanged from the prior SP4096 record while improving the reported score via mixture evaluation