PR #1795

open

Record: SP4096 + byte-level PPM adaptive-λ mixture — val_bpb 0.95165 (full-val, 3-seed)

val_bpb

0.9516

Architecture

Transformer

Optimizer

—

Artifact Size

15.93-15.96 MB

Training Techniques

Quantization

GPTQ

bits: 6

scope: all

Architecture

depth recurrence

Depth-recurrent Transformer stack inherited from the SP4096 record.

parameters: {"layers":11}

LeakyReLU

Uses LeakyReLU squared MLP activation in the inherited stack.

parameters: null

Weight Averaging

EMA

parameters: null

Evaluation

sliding window eval

parameters: {"full_val":true}

Test-Time Training

score-first TTT

parameters: {"granularity":"byte","model":"PPM-D order 5","adaptive_mixture":true}

Other

other

Byte-level adaptive-λ mixture of NN byte probabilities and online PPM-D byte probabilities.

parameters: {"lambda_high":0.9,"lambda_low":0.05,"threshold":0.9}

Byte-level PPM adaptive-λ mixture applied at evaluation time on full validation data
Full-val 3-seed record with val_bpb 0.95165
Score-before-update online byte-level PPM-D mixture framed as legal test-time training
NN-only stack unchanged from the prior SP4096 record while improving the reported score via mixture evaluation