PR #880
openRecord: PhraseCache + OrderAdaptive N-gram + RegimeTracker — val_bpb 0.1003 (3-seed mean)
by RoyiRaView on GitHub
val_bpb
0.1003
Architecture
Transformer
Optimizer
AdamW
Artifact Size
~15.7 MB
Training Techniques
Test-Time Training
score-first TTT
parameters: {"epochs":2,"learning_rate":0.0001,"freeze_blocks":2}
Weight Averaging
EMA
parameters: {"decay":0.998}
Optimizer
AdamW
weight_decay: null
momentum: null
other_params: {"lr":0.0001}
LR Schedule
cosine decay
parameters: {"adaptive":true}
Quantization
GPTQ
bits: 5
scope: all
Compression
zstd
level: 22
Regularization
magnitude pruning
parameters: {"pct":0.05}
Architecture
BigramHash
Bigram hash component used in the model architecture.
parameters: {"dim":128,"size":6144}
XSA
XSA applied across all layers with windowed setting.
parameters: {"layers":11,"ws":8}
VE128
VE128 module used in later layers.
parameters: {"layers":[9,10]}
MLP3x
Expanded MLP with LeakyReLU activation.
parameters: {"multiplier":3.5}
LeakyReLU
LeakyReLU(0.5)^2 activation used in the MLP.
parameters: {"slope":0.5}
Other
other
Long Phrase Cache using variable-length suffix matching with rolling hashes.
parameters: {"probes":[48,36,28,20,16]}
other
Order-adaptive entropy gating for n-gram cache blending.
parameters: {"orders":[2,9]}
other
Online regime tracker that modulates alpha based on detected text regime.
parameters: {"window":4096,"alpha_multiplier_range":[0.7,1.5]}
Novel Contributions
- Long Phrase Cache with variable-length suffix matching
- Order-Adaptive Entropy Gating
- Online Regime Tracker
- 3-seed mean val_bpb of 0.1003