PR #1833
openWIP Record: SP8192 + CaseOps + Depth Curriculum + FreqGPTQ + PPM adaptive-λ mixture — val_bpb 0.90687688 (1-seed)
by pragnyanramthaView on GitHub
val_bpb
0.9069
Architecture
Transformer
Optimizer
—
Artifact Size
~24.5 MB
Training Techniques
Architecture
depth recurrence
Depth curriculum stack with CaseOps and depth progression 1→3→4.
parameters: {"layers":[1,3,4]}
Quantization
GPTQ
bits: 6
scope: all
Other
other
FreqGPTQ: upweights the top-100 most frequent calibration tokens by 2× during Hessian collection to improve int6 quantization quality on frequent vocabulary items.
parameters: {"top_k":100,"weight_multiplier":2}
other
PPM-D adaptive-λ mixture: byte-level PPM order-5 predictor mixed with NN log-probs at evaluation time using an adaptive gate.
parameters: {"ppm_order":5,"lambda_high_confidence":0.05,"lambda_low_confidence":0.9,"confidence_threshold":0.9}
Test-Time Training
full TTT
parameters: null
Weight Averaging
EMA
parameters: null
Novel Contributions
- FreqGPTQ frequency-weighted calibration for GPTQ
- PPM-D adaptive-λ mixture at evaluation time
- Depth curriculum stack with CaseOps
- Single-seed screening run with reported val_bpb improvement