PR #1879

open

Record: DualClock PPM token-context mixture — 1.0803 BPB seed 42

by okezueView on GitHub

val_bpb

1.0803

Architecture

Transformer

Optimizer

SGD

Artifact Size

15,977,914 bytes

Training Techniques

Test-Time Training

score-first TTT

parameters: {"enabled":true,"learning_rate":0.002,"epochs":3}

Architecture

BigramHash

Hash-bucketed causal PPM experts using rolling-hash context tables for global and document-local token-context modeling.

parameters: {"global_order":6,"local_order":8,"global_buckets":2048,"local_buckets":2048}

Partial RoPE

Partial rotary positional embedding applied to a subset of dimensions.

parameters: {"dimensions":16,"base_dimensions":64}

XSA

Cross/self-attention style architectural modification used in the base stack.

parameters: {"layers":4}

VE128

Value embedding / value residual style component used in later layers.

parameters: {"layers":[9,10]}

LeakyReLU

LeakyReLU squared MLP activation variant in the base model.

parameters: {"variant":"squared"}

Weight Averaging

EMA + Tight SWA

parameters: {"ema_decay":0.997,"swa_interval":50}

Quantization

GPTQ-lite

bits: 6

scope: model weights

Optimizer

Parallel Muon

weight_decay: null

momentum: null

other_params: null

Regularization

layerwise LN scale

parameters: {"scale":"1/sqrt(layer+1)"}

Other

other

Fixed-share Bayesian mixer blending neural, global PPM, and local PPM experts with deterministic chunk-wise posterior updates.

parameters: {"share":0.005,"prior":[0.9,0.07,0.03]}

other

GPU-vectorized causal scoring with FNV rolling hashes, hash-bucketed count tables, and prefix-rank counter for chunk-local scoring.

parameters: null

Novel Contributions

DualClock mixture of neural, global PPM, and document-local PPM experts
Fixed-share Bayesian mixer with deterministic chunk-wise weight updates
GPU-vectorized causal PPM scoring using FNV rolling hashes and hash-bucketed count tables
Prefix-rank counter for parallel chunk-local causal scoring without leakage
Score-before-update legality for both TTT and mixture updates